[HN Gopher] Your computer is a distributed system ___________________________________________________________________ Your computer is a distributed system Author : carlesfe Score : 173 points Date : 2022-03-30 14:07 UTC (8 hours ago) (HTM) web link (catern.com) (TXT) w3m dump (catern.com) | jmull wrote: | > This is something unique: an abstraction that hides the | distributed nature of a system and actually succeeds. | | That's not even remotely unique. | | OP is grappling with "the map is not the territory" vs. maps have | many valid uses. | | Abstractions can be both not accurate in every context and 100% | useful in many, many common contexts. | | Also (before you get too excited), abstractions have quality: | there are good abstractions -- which are useful in many common | contexts -- and bad abstractions -- which overpromise and turn | out to be misleading in some or many common contexts. | | I'll put it this way: the idea that _The Truth_ exists is a rough | (and not particularly useful) abstraction. If you have a problem | with that, it just means you have something to learn to engage | reality more fruitfully. | sesuximo wrote: | I think there's a big difference which is that your computer is | allowed to crash when one component breaks whereas a distributed | system is typically more fault tolerant. | uvdn7 wrote: | This is actually what makes handling the distributed system in | a single computer easier - everything crashing together makes | it an easier problem. | | E.g. you have multiple CPU cachelines, caching different values | of a main memory location. And there are different cache | coherence protocols to keep them sane. But cache coherence | protocols never need to worry about the failure mode when one | cacheline is temporarily unavailable but the others are. | | So yes, there's a distributed system in each multi-core | computer, but it's a distributed system with an easier failure | mode. | | If you like more analogies between CPU caches and distributed | systems, https://blog.the-pans.com/cpp-memory-model-as-a- | distributed-... :p | harperlee wrote: | Ideally a peripheral crashing should not crash the whole | system. | catern wrote: | And indeed it does not: Modern operating systems like Linux | can perfectly well deal with all kinds of devices crashing or | disappearing at runtime. Just like in larger distributed | systems. | amelius wrote: | Yes, and your computer is a ball of interconnected microservices | too. | JL-Akrasia wrote: | You are also a distributed system. | __turbobrew__ wrote: | Some more distributed than others | ilaksh wrote: | This proves that conventional wisdom (such as the idea that | abstracting distributed computation is unworkable) is often | wrong. | | What happens is enough people try to do something and can't quite | get it to work quite right that it eventually becomes assumed | that anyone trying that approach is naive. Then people actively | avoid trying because they don't want others to think they don't | know "best practices". | | Remember the post from the other day about magnetic amplifiers? | Engineers in the US gave up on them. But for the Russians, mag | amps never became "unworkable" and uncool to try, and they | eventually solved the hard problems and made them extremely | useful. | | Technology is much more about trends and psychology than people | realize. In some ways, so is the whole world. It seems to me that | at some level, most humans never _really_ progress beyond middle- | school level. | | The starting point for analyzing most things should probably be | from the context of teenage primates. | andrey_utkin wrote: | Your body is a distributed system. Your brain is a distributed | system. A live cell is a distributed system. A molecule is a | distributed system. In other news, water is wet. | rbanffy wrote: | This was true for several home computers since the late 70's. | Atari 8-bit computers had all peripherals connecting via a serial | bus, each one with its own little processor, ROM, RAM and IO (the | only exception, IIRC, was the cassete drive). Commodores also had | a similar design for their disk drives. A couple months back a | 1541 drive was demoed running standalone with custom software and | generating a valid NTSC signal. | Frenchgeek wrote: | ( https://youtu.be/zprSxCMlECA ) | dkersten wrote: | Wow, that is cool! | catern wrote: | Wow! Reminds me of https://www.rifters.com/crawl/?p=6116 | | A hydrocephalic demo! | rbanffy wrote: | I think that plan hits a wall for heat dissipation and | nutrient/oxygen consumption - not sure we have lungs large | enough to keep a brain doing 10x more computation | oxygenated, nor perspiration glands to keep it cool. | | But I'd be totally in to a 10% increase in IQ in exchange | to being able to eat 10% more sugar. | YZF wrote: | well, it's been true since a wire has been connecting any two | bits. The processor, ROM, RAM are all "distributed" systems | internally. | rbanffy wrote: | That's not what "distributed system" means. | YZF wrote: | What's your definition of "distributed system" then? | | Two flipflops interconnected on one wafer. Two flipflops | inteconnected on one PCB. Two flipflops interconnected with | a cable between two PCBs. These are all "distributed". | They're all subject e.g. to the CAP theorem. Sure, the | probability of one flipflop failing on the same wafer is | quite small. The probability of one flipflop failing on one | PCB is slightly larger. But fundamentally all these systems | are the same. If you have two computers on a network you | can make the probability of failure (e.g. of the network) | pretty small. | rbanffy wrote: | I start counting them as independent computers when they | have their own firmware. | TickleSteve wrote: | It absolutely is. | | distribution of signals within smaller systems | (microcontrollers, ASICs, FPGAs, etc) are all distributed | systems. Ask anyone doing any kind of circuit design about | distributing clocks and clock skew, etc. | rbanffy wrote: | If you read the article, you'll understand it's about our | computers being networks of smaller computers. The SSD, | GPU, NIC, and BMC has its own CPU, memory, and operating | system. | alexisread wrote: | There are lots of good resources in this area: The programming | language of the transputer | https://en.m.wikipedia.org/wiki/Occam_(programming_language) | | Bluebottle active objects https://www.research- | collection.ethz.ch/bitstream/handle/20.... with some discussion | of DMA | | Composita components | http://concurrency.ch/Content/publications/Blaeser_Component... | | Mobile Maude (only a spec) | http://maude.sip.ucm.es/mobilemaude/mobile-maude.maude | | Kali scheme (atop Scheme48 secure capability OS) | https://dl.acm.org/doi/pdf/10.1145/213978.213986 | | Kali is probably the closest to a distributed OS, supporting | secure thread and process migration across local and remote | systems (and makes that explicit), distributed profiling and | monitoring tools, etc. It is basically an OS based on the actor | model. It doesn't scale massively as routing nodes was out of | scope (it connects all nodes on a bus), but that can easily be | added. | | Extremely small (running in 2mb ram), it covers all of R5rs, and | the VM has been adapted to bare metal. | | I feel that there is more to do, but a combination of those is | probably the right direction. | throwaway787544 wrote: | The thing we are missing still is the distributed OS. Kubernetes | only exists because of the missing abstractions in Linux to be | able to do computation, discovery, message passing/IO, | instrumentation over multiple nodes. If you could do _ps -A_ and | see all processes on all nodes, or run a program and have it | automatically execute on a random node, or if ( _grumble grumble_ | ) Systemd unit files would schedule a minimum of X processes on N | nodes, most of the K8s ecosystem would become redundant. A lot of | other components like unified AuthZ for linux already exist, as | well as networking (WireGuard anyone?). | ff317 wrote: | There were older attempts at this stuff, in the 90s with | "Beowulf" clusters that had cross-machine process management | and whatnot. It's a lot harder than it seems to make this | approach make sense in the real world, as the abstraction hides | important operational details. The explicit container + | orchestration abstraction is probably closer to the ideal than | trying to stretch linux/systemd/cgroups across the network | "seamlessly". It's clearer what's going on and what the | operational trade-offs are. | gnufx wrote: | > in the 90s with "Beowulf" clusters | | In case of any confusion, that sort of thing wasn't a generic | Beowulf feature, but it sounds like Bproc. I don't know if | it's still used. (The Sourceforge version is ancient.) | | https://updates.penguincomputing.com/clusterware/6/docs/clus. | .. https://sourceforge.net/projects/bproc/ | | Containers actually only make it harder to "orchestrate" your | distributed processes in an HPC system. | mnd999 wrote: | Imagine a Beowulf cluster of hot grits in soviet Russia with | CowboyNeal. | uvdn7 wrote: | Abstract a fleet of machines as single super computer sounds | nice. But how about partial failures? It's something that a | real stateful distributed system would have to deal with all | the time but a single host machine almost never deals with (do | you worry about a single cacheline failure when writing a | program?). | marcosdumay wrote: | There is a huge amount of research about distributed OSes | (really, they were very fashionable at the 90's and early | 00's). Plenty of people worked on this problem, and it's | basically solved (as in, we don't have any optimal solution, | but it won't be a problem on a real system). | NavinF wrote: | It's "basically solved" in the sense that everyone gave up | on distributed OSes and used k8s instead. | zozbot234 wrote: | K8s is doing distributed OS's on easy mode, supporting | basically ephemeral 'webscale' workloads for pure | horizontal scaling. Even then it introduces legendary | amounts of non-essential complexity in pursuit of this | goal. It gets used because "Worse is better" is a thing, | not because anyone thinks it's an unusually effective way | to address these problems. | ohYi55 wrote: | als0 wrote: | I remember the Barrelfish OS was trying to tackle this problem | head on https://barrelfish.org/ | evandrofisico wrote: | Actually, at some point in the 2.4 kernel it was possibile to | do that, with single image systems, such as openmosix, that | handled process discovery, computation and much more, but | underneath the simple user interface it was complex, kinda | insecure and so, was never abandoned and never ported to newer | versions. | oceanplexian wrote: | Am I the one who doesn't want this? | | The entire point of UNIX philosophy (Which seems to be | something they aren't teaching in software development these | days) is to do one thing and do it well. We don't need Linux | operating operating as a big declarative distributed system | with a distributed scheduling systems and a million half-baked | APIs to interact with it, the way K8s works. If you want that | you should build something to your specific requirements, not | shove more things into the kernel. | random314 wrote: | The Unix philosophy was a reasonably good model decades ago. | But I think it is over romanticized. | | It's binary blob design is no good for security, as opposed | to a byte code design like Forth. Its user security model was | poor and doesn't help with modern devices like phones. Its | multiprocess model was ham fisted into a multithreading model | to compete with windows NT. Its asynchronous i/o model has | always been a train wreck even compared to NT. Its design | creates performance issues, especially in multiproc | networking code with needless amount of memcopys. Now folks | are rewriting the networking stack in user space. Its | software abstraction layer was some simple scheme from the | 70s which has fragmented into crazy number of implementations | now. Open source developers still complain about how much | easier it is to build a package for windows, as opposed to | linux. It was never meant to be a distributed system either. | Modern enterprise compute cannot scale by treating and | managing each individual VM as it's own thing with clusters | held together by some sysadmins batch scripts. | anthk wrote: | And yet Linux manages better doing heavy I/O stuff over | filesystems than Windows NT. | pjmlp wrote: | Because it doesn't provide the abstraction capabilites | that NTFS allows for third parties, so naturally it is | faster doing less. | aseipp wrote: | A good paper giving a concrete example of all this is "A | fork() in the road", where you can see how an API just like | fork(2) has an absolutely massive amount of ramifications | on the overall design of the system, to the point "POSIX | compliance" resulted in some substantial perversions of the | authors' non-traditional OS design, all of which did | nothing but add complexity and failure modes ("oh, but I | thought UNIX magically gave you simplicity and made | everything easy?") It also has significantly diverged from | its "simple" original incarnation in the PDP-11 to a | massive complex beast. So you can add "CreateProcess(), not | fork()" on the list of things NT did better, IMO. | | And that's just a single system call, albeit a very | important one. People simply vastly overestimate how rose- | tinted their glasses are and all the devils in the details, | until they actually get into the nitty gritty of it all. | | https://www.microsoft.com/en- | us/research/uploads/prod/2019/0... | goodpoint wrote: | Linux/UNIX does not have to turn into a mess like k8s to be | natively distributed. Plan9 was doing it with a tiny codebase | in comparison. | pjmlp wrote: | The philosophy that is cargo culted and was never taken | seriously by any commercial UNIX. | aseipp wrote: | All of this was possible with QNX literally decades ago, and | it didn't need whatever strawman argument you're making up in | your head in order to accomplish it. QNX was small, fast, | lean, real-time, distributed, and very powerful for the time. | Don't worry, it even had POSIX support. A modern QNX would be | very well received, I think, precisely because taking a | distributed-first approach would dramatically simplify the | whole system design versus tacking on a distributed layer on | top of one designed for single computers. | | > Which seems to be something they aren't teaching in | software development these days | | This is funny. Perhaps the thing you should have been taught | instead is history, my friend. | jeffreygoesto wrote: | You mean QNet [0]? That is still alive... It is for LAN use | ("Qnet is intended for a network of trusted machines that | are all running QNX Neutrino and that all use the same | endianness."), so extra care is needed to secure this group | of machines when exposed to the internet. | | [0] https://www.qnx.com/developers/docs/7.0.0///index.html# | com.q... | | [1] https://recon.cx/2018/brussels/resources/slides/RECON- | BRX-20... | aseipp wrote: | Correct. Thought QNet itself is only one possible | implementation, in a sense (but obviously the one shipped | with QNX.) And the more important part of the whole thing | is the message-passing API design built into the system, | which enables said networking transparency, because it | means your programs are abstracted over the underlying | transport mechanism. | | "LAN use" I think would qualify roughly 95% of the need | for a "distributed OS," including a lot of usage of K8s, | frankly. Systems with WAN latency impose a different set | of challenges for efficient comms at the OS layer. But | even then you also have to design your apps themselves to | handle WAN-scale latencies, failover, etc too. So it | isn't like QNX is going to make your single-executable | app magic or whatever bullshit. But it exposes a set of | primitives that are much more tightly woven into the core | system design and much more flexible for IPC. Which is | what a distributed system is; a large chattery IPC | system. | | The RECON PDF is a very good illustration of where such a | design needs to go, though. It doesn't surprise me QNX is | simply behind modern OS's exploit mitigations. But on top | of that, a modern take on this would have to blend in a | better security model. You'd really just need to throw | out the whole UNIX permission model frankly, it's simply | terrible as far as modern security design is concerned. | QNet would obviously have to change as well. You'd at | minimum want something like a capability-based RPC layer | I'd think. Every "application server" is like an | addressable object you can refer to, invoke methods on, | etc. (Cap'n Proto is a good way to get a "feel" for this | kind of object-based server design without abandoning | Linux, if you use its RPC layer.) | | I desperately wish someone would reinvent QNX but with | all the nice trappings and avoiding the missteps we've | accumulated over the past 10 to 15 years. Alas, it's much | more profitable to simply re-invent its features poorly | every couple of years and sell that instead. | | This overview of the QNX architecture (from 1992!) is one | of my favorite papers for its simplicity and | straightforward prose. Worth a read for anyone who like | OS design. | | https://cseweb.ucsd.edu/~voelker/cse221/papers/qnx- | paper92.p... | Karrot_Kream wrote: | The UNIX philosophy made more sense as an abstraction for a | computer when computers were simpler. Computers nowadays | (well at least since 2006-ish) have multiple cores executing | simultaneously with complicated amounts of background logic, | interrupt-driven logic, shared caches, etc. The UNIX | philosophy doesn't map to this reality at all. Right now | there's no set of abstractions except machine code that | exposes the machine's distributed systems' in a coherent | abstraction. Nothing is stopping someone else from writing a | UNIX abstraction atop this though. | generalizations wrote: | The idea of doing one thing, and doing it well, isn't | dependent on the simplicity of the underlying system (I | imagine that PDP-11 systems seemed impressively complicated | in their time, too). The UNIX philosophy is a paradigm for | managing complexity. To me, that seems more relevant with | modern computers, not less. | | > "A program is generally exponentially complicated by the | number of notions that it invents for itself. To reduce | this complication to a minimum, you have to make the number | of notions zero or one, which are two numbers that can be | raised to any power without disturbing this concept. Since | you cannot achieve much with zero notions, it is my belief | that you should base systems on a single notion." - Ken | Thompson | icedchai wrote: | I think OpenVMS did this... in the 80's. | gnufx wrote: | Distributed computation with message passing (and RDMA) is the | essence of HPC systems. SGI systems supported multi-node Linux | single system images up to ~1024 cores a fair few years ago, | but they depend on a coherent interconnect (NUMAlink, | originally from the MPIS-based systems under Irix). | | However, you don't ignore the distributed nature of even single | HPC nodes unless you want to risk perhaps an order of magnitude | performance loss. SMP these days doesn't stand for Symmetric | Multi-Processing. | zozbot234 wrote: | Distributed shared memory _is_ feasible in theory even via | being provided in-software by the OS. You 're right that this | would not change the physical reality of message passing, but | it would allow a single multi-processor application code to | operate seamlessly using either shared memory on a single | node, or distributed memory on a large cluster. | gnufx wrote: | I talk about the practice in HPC, not theory, and this | stuff is literally standard (remote memory of various types | and the same thing running the same, modulo performance and | resources, on a 32-core node as on one core each of 32 | nodes). However, you still need to consider network non- | uniformity at levels from at least NUMA nodes up, at least | if you want performance in general. | f0e4c2f7 wrote: | I very much agree with this and while Kubernetes is better than | a poke in the eye, I look forward to the day when there is a | true distributed OS available in the way you describe. It's | possible Kubernetes could even grow into that somehow. | Karrot_Kream wrote: | I think you're looking at the wrong abstraction level. You're | thinking on a node (computer) basis. Even on a single computer, | many of the things that happen are distributed. DMA | controllers, input interrupts, kernel-forced context switches, | there's a lot going on there but we still pretend that our | computers are just executing sequential code. I agree with the | OP and think it's high time we treat the computer as the | distributed system it is. Fuschia and GenodeOS are both making | developments in this direction. | zozbot234 wrote: | The abstractions are there in Linux, largely imported from plan | 9. And work is ongoing to support further abstractions, such as | easy checkpoint/restore of whole containers. Kubernetes is a | very new framework intended to support large-scale | orchestration and deployment in a mostly automated way, driven | by 'declarative' configuration; at some point, these features | will be rewritten in a way that's easier to understand and | perhaps extend further. | MisterTea wrote: | > The abstractions are there in Linux, largely imported from | plan 9. | | Which abstractions are those? | zozbot234 wrote: | > to be able to do computation, discovery, message | passing/IO, instrumentation over multiple nodes. | | Kernel namespaces are the building blocks for this, because | an app that accesses all kernel-managed resources via | separate namespaces is insulated from the specifics of any | single node, and can thus be transparently migrated | elsewhere. It enables the kind of location independence | that OP is arguing for here. | stormbrew wrote: | Linux namespaces don't actually do any of those things | though? Like, not even a single one of them are made | possible because of namespaces. They're all possible or | not possible precisely as much with or without | namespaces. | | The thing is when comparing plan9 and linux here, you | have to recognize that linux has it backwards. On plan9 | namespaces are emergent from the distributed structure of | the system. On linux they form useful tools to _build_ a | distributed system. | | But what's possible on plan9 is possible because it | really does do "everything is a file," so your namespace | is made up of io devices (files) and you can construct or | reconstruct that namespace as you need. | | Like, this[1] is a description of how to configure | plan9's cpu service so you run programs on another node. | | [1] | https://9p.io/wiki/plan9/Expanding_your_Grid/index.html | | Nothing in there makes any sense from a linux containers | perspective. You can't namespace the cpu. You can't | namespace the gui terminal. All you can namespace is | relatively superficial things, and even then opening up | that namespacing to unprivileged users has resulted in | several linux CVEs over the last year because it's just | not built with the right assumptions. | zozbot234 wrote: | Doesn't Linux create device files in userspace these | days, anyway? I thought that's what that udev stuff was | all about. So I'm not sure that the Plan9 workflow is | _inherently_ unfeasible, there 's just no idiomatic | support for it just yet. | stormbrew wrote: | device nodes are managed in userspace nowadays yes, but | they're just special files that identify a particular | device id pair and then the OS acts on them in a special | way. udev is just the userspace part of things that | manages adding and removing them in response to hotplug | events. Everything that matters about them is still | controlled by the kernel. | glorfindel66 wrote: | That's not at all what Linux namespaces permit. It's a | side effect of using them that could be leveraged using | something like CRIU, sure, but it's not what they're for | and they're not a building block for anything mentioned | in the portion of their comment you quoted. | | Namespaces simply make the kernel lie when asked about | sockets and users and such. It's intended for isolation | on a single server. They're next to useless in | distributed work, particularly the kind being discussed | here (Plan 9ish). You actually want the opposite: to | accomplish that, you want the kernel to lie even harder | and make things up in the context of those interfaces, | rather than hide things. Namespaces don't really get you | there in their current form. | zozbot234 wrote: | > That's not at all what Linux namespaces permit. | | Isolating processes from the specifics of the system | they're running on is a key feature of the namespace- | based model; it seems weird to call it a "side effect | only". We should keep in mind that CRIU itself is still a | fairly new feature that's only entered mainline recently, | and the kernel already has plenty of ways to "make up" | more virtual resources that are effectively controlled by | userspace. While it may be true that these things are | largely ad hoc for now, it's not clear that this will be | an obstacle in the future, | gnufx wrote: | I can talk about namespaces in HPC distributed systems, | and they don't look anything like Plan 9 to me. They make | life harder in various respects, and even dangerous with | Linux features that don't take them into account (like at | least one of the "zero-copy" add-on modules used by MPI | shared memory implementations). | NavinF wrote: | Eh I can't see Linux getting a built-in distributed kv store | (etcd) any time soon. Same goes for distributed filesystems. | All you have out of the box is nfs which gives you the worst of | both worlds: Every nfs server is a SPOF yet these servers don't | take advantage of their position to guarantee even basic | consistency (atomic appends) that you get for free everywhere | else. | | And besides how would you even implement all those features you | listed without recreating k8s? A distributed "ps -A" that just | runs "for s in servers; ssh user@$s ps; done" and sorts the | output would be trivial, but anything more complex (e.g. | keeping at least 5 instances of an app running as machines die) | requires distributed and consistent state. | zozbot234 wrote: | > requires distributed and consistent state | | Distributed yes, but not necessarily consistent. You can use | CRDTs to manage "partial, flexible" consistency requirements. | This might mean, e.g. sometimes having more than 5 instances | running, but should come with increased flexibility overall. | throwaway787544 wrote: | Fwiw those features existed in Mosix (a Linux SSI patch) 2 | decades ago... I feel like we could probably do it again | | In terms of CAP, yeah it might not have been technically as | reliable. But there's different levels of reliability for | different applications; we could implement a lot of it in | userland and tailor as needed | wwalexander wrote: | Plan 9 was designed in this way, but never took off. | | Rob Pike: | | > This is 2012 and we're still stitching together little | microcomputers with HTTPS and ssh and calling it revolutionary. | I sorely miss the unified system view of the world we had at | Bell Labs, and the way things are going that seems unlikely to | come back any time soon. | monocasa wrote: | Are there any good walkthroughs of what a good, distributed | plan 9 setup looks like from either a development or a | administration perspective? Particularly an emphasis on many | distributed compute nodes (or cpu servers in plan 9 | parlance). | jasonwatkinspdx wrote: | I think Rob is right to call out the problem, but is being a | bit rose colored about Plan 9. | | Plan 9 was definitely ahead of its time, but it's also a far | cry from the sort of distributed OS we need today. | "Everything is a remote posix file" ends up being a really | bad abstraction for distributed computing. What people are | doing today with warehouse scale clusters indeed has a ton of | layers of crap in there, and I think it's obvious to yern for | sweeping that away. But there's no chance you could do that | with P9 as it was designed. | wahern wrote: | "Everything is a file" originally referred to read and | write as universal object interfaces. It's similar to | Smalltalk's send/receive as an idealized model for object- | based programming. Hierarchical filesystem namespaces for | object enumeration and acquisition is tangential, though it | often works well because most namespaces (DNS, etc) tend to | be hierarchical. (POSIX filesystem semantics doesn't really | figure into Plan 9 except, perhaps, incidentally.) | Filesystem namespacing isn't quite as abstract, though | (open, readdir, etc, are much more concrete interfaces), | making impedance mismatch more likely. | | The abstraction is sound. We ended up with TCP and HTTP | instead of IL and 9P (and at scale, URLs instead of file | descriptors), because of trust issues, but that's not | surprising. Ultimately the interface of read/write sits | squarely in the middle of all of them, and most others. To | build a distributed system with different primitives at the | core, for example, send/receive, requires creating | significantly stronger constraints on usage and | implementation environments. People do that all the time, | but in _practice_ they do so by _building_ atop the file | interface model. That 's what makes the "everything is a | file" model so powerful--it's an interoperability sweet | spot; an axis around which you can expect most large-scale | architectures to revolve around at their core, even if the | read/write abstraction isn't visible at the point users | (e.g. application developers) interact with the | architecture. | jasonwatkinspdx wrote: | A hierarchical namespace is fine, but the | open/read/write/sync/close protocol on byte based files | is definitely inadequate. The constraints on usage you | decry are in fact fundamental constraints of distributed | computing that are at odds with the filesystem | abstraction. And this is exactly what I was getting at in | talking about rose colored glasses with P9. It in no way | is a replacement for something like Colossus or Spanner. | zozbot234 wrote: | > P9 ... in no way is a replacement for something like | Colossus or Spanner. | | Colossus and Spanner are both proprietary so there's very | limited info on them, but both seem to be built for very | specialized goals and constraints. So, not really on the | same level as a general system interface like 9P, which | is most readily comparable to, e.g. HTTP. In Plan 9, 9P | servers are routunely used to locally _wrap_ connections | to such exotic systems. You can even require the file | system interface locally exposed by 9P to be endowed with | extra semantics, e.g. via special messages written to a | 'control' file. So any level of compatibility or lack | thereof with simple *nix bytestreams can be supported. | NavinF wrote: | Meh. Every time a 9p server dies, every client dies. Plan9 is | not comparable to k8s. | nautilus12 wrote: | Glad to see Plan 9 getting some love in the comments even if | it didn't make it into the article. | gnufx wrote: | If you yearn for Plan 9 -- I'm not sure I do -- Minnich's | current incarnation of the inspiration seems to be | https://github.com/u-root/cpu | jlpom wrote: | This describe more a Single System Image [0] to me (WPD | includes Plan 9 as one but considering it does not does not | supports process migration I find it moot). LinuxPMI [1] | seems to be a good idea but they seems to be based on Linux | 2.6, so you would have to heavily patch newer kernel. The | only thing that seems to support process migration with | current software / still active are CRIU [2] (which doesn't | support graphical/wayland programs) and DragonflyBSD [3] (in | their own words very basic). | | [0]: https://en.wikipedia.org/wiki/Single_system_image [1]: | http://linuxpmi.org [2]: criu.org [3]: https://man.dragonflyb | sd.org/?command=sys_checkpoint§ion... | zozbot234 wrote: | Graphical programs could be checkpointed and restored as | long as they don't directly connect to the hardware. | (Because the checkpoint/restore system has no idea how to | grab the hardware's relevant state or replicate it on | restore.) This means running those apps in a hardware- | independent way (e.g. using a separate Wayland instance | that connects to the system one), but aside from that it | ought to be usable. | jlpom wrote: | For CRIU it is not supported: | https://criu.org/Integration#Wayland.2FWeston, also in my | experience it doesn't work. Are you talking about an | other software? | zozbot234 wrote: | It has been done "virtually" by going through e.g. VNC | https://criu.org/VNC . Alternately, CRIU apps could be | required to use virt-* devices, which CRIU might | checkpoint and restore similar to VM's. | stormbrew wrote: | I don't really see any reason to consider process migration | a required feature of either a distributed os or a single | system image. Even on a single computer this isn't always | practical or desireable (ie. you can't 'migrate' a program | running on your gpu to your cpu, and you can't trivially | migrate a thread from one process to another either). | | Not all units of computation are interchangeable, and a | system that recognizes this and doesn't try to shoehorn | everything down to the lowest common denominator actually | _gains_ some expressive power over a uniform system (else | we would not have threads). | gnufx wrote: | For what it's worth, the HPC-standard way of | checkpointing/migrating distributed execution (in | userspace, unlike CRIU) is https://dmtcp.sourceforge.io/ It | supports X via VNC -- I've never tried -- but I guess you | could use xpra. | MisterTea wrote: | > his describe more a Single System Image [0] to me | | No, Plan 9 is not a SSI OS. The idea is all resources are | exposed via a single unified file oriented protocol: 9p. | All devices are files which means all communication happens | over fd's meaning you look at your computer like a patch | bay of resources, all communicated with via read() and | write(). e.g.: [physical disk]<-->[kernel: | sd(3)]-----< /dev/sdE0/ [audio card] <---->[kernel: | audio(3)]--< /dev/audio [keyboard]-------->[kernel: | kbd(3)]----< /dev/kbd | | Looking above it looks like Unix but with MAJOR | differences. First off the disk is a directory containing | partitions which are just files who's size is the | partitions size. You can read or write those files as you | please. Since the kernel only cares about exposing hardware | as files, the file system on a partition needs to be | translated to 9p. We do this with a program that is a file | server which interprets e.g. a fat32 fs and serves it via | 9p (dossrv(4)). Your disk based file system is just a user- | space program. | | And since files are the interface you can bind over them to | replace them with a different service like mixfs(4). | /dev/audio is like the old linux oss where only one program | could open a sound card at a time. To remedy this on plan 9 | you run mixfs which opens /dev/audio and then binds itself | over /dev replacing /dev/audio in that namespace with a | multiplexed /dev/audio from mixfs. Now you start your | window manager and the children programs will see mixfs's | /dev/audio instead of the kernel /dev/audio. Your programs | can now play audio simultaneously without changing | ANYTHING. Now compare that simplicity to the trash fire | linux audio has been and continues to be with yet another | audio subsystem. | | Keyboard keymaps are a filter program sitting between | /dev/kbd and your program. All it does is read in key codes | and maps key presses according to a key map which is just a | file with key->mapping lines. Again, keyboards are files so | a user space file server can be a keyboard such as a GUI | keyboard that binds itself over /dev/kbd. | | Now all those files can be exported or imported to other | machines, regardless of CPU architecture. | | Unix is an OS built on top of a single machine. Plan 9 is a | Unix built on top of a network. It's the closest I can get | to computing nirvana where all my resources are available | from any machine with simple commands that are part of the | base OS which is tiny compared to the rest. | emteycz wrote: | Best explanation of Plan 9 I've ever seen | [deleted] | [deleted] | benreesman wrote: | Eric Brewer thinks this is a good point of view on such things: | | https://codahale.com/you-cant-sacrifice-partition-tolerance/ | | L1-blockchain entrepreneurs and people who got locked into | MongoDB aside, I think most agree. | pkilgore wrote: | What is the kernal and the bus for the cloud? | simne wrote: | These all now are virtual state machines, which store some | state and convert all kernel/bus behavior to interaction with | connected via network devices. | | At the moment there are lot of such devices - exists for sure | many full featured, like Raspberry; but also there are network | connected ATA drives, network connected sensors, RAM, ROM | (Flash); BTW IEEE 1394 FireWire is serial interface, could been | used as networking bus; exists adapters ethernet-usb (and many | commodity devices work well with such connection), so virtually | anywhere could been considered as connected via network bus. | Even exists USB 3.0 to PCIe adapter, to use PCIe device throw | USB connection. | | And in reality exists problem, that FireWire so distributed, | that it where possible on Macs with FireWire interface, to read | memory via this interface. | | So hardware and software exists, but need some steps to make | it's usage safe. | WestCoastJustin wrote: | Great post called "Achieving 11M IOPS & 66 GB/s IO on a Single | ThreadRipper Workstation" [1, 2] that basically walks through | step-by-step that your computer is just a bunch of interconnected | networks. | | Highly recommend the post if you're into this and also sort of | amazing how far single systems have come. You can basically do | "big data" type things on this single box. | | [1] https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd- | th... | | [2] https://news.ycombinator.com/item?id=25956670 | syngrog66 wrote: | once you learn to bias to thinking in terms of message passing | between actors, and, bias to having immutable shared state, | then,a lot of problems become easier to decompose and solve | elegantly, esp at scale | hsn915 wrote: | Yes but your computer will not gracefully handle CPUs randomly | failing or RAM randomly failing. Sure, storage devices can come | and go, but that's been the case since forever, and most programs | are not written to handle this edge case gracefully. Except for | the OS kernel. | | The links between the components of your computer are solid and | cannot fail like actual computer network connections. | | In terms of "CAP" theorom, the system has no Partition tolerance. | If one of the the links connecting CPUs/GPUs/RAM breaks, all hell | breaks loose. If a single instruction is not processed correctly, | all hell might break loose. | | So I find the analogy misleading. | StillBored wrote: | There have been machines tolerant to CPU and Mem failures, and | to a certain extent this sorta works on some of the higher end | machines that support ram/cpu hotplug. (historically see | hp/tandem/nonstop, sunos/imp, etc). | | The problem is linux's monolithic model, doesn't work well for | kernel checkpoint/restore despite it actually supporting | hotplug cpu/ram it they have to be gracefully removed. | | So, this is less about the machine being distributed, and more | about the fact that linux is the opposite of a microkernel/etc | that can isolate and restart its subsystems in the face of | failure. Its also sorta funny that while these types of | operations tend to need to be designed into the system, the | last major OS's designed this way were done in the 1980's. | dwohnitmok wrote: | I know of no OSes that are resilient to CPU cores producing | wrong results (or incorrect mem results: I consider ECC a | lower level concern that is not part of the OS), whereas a | lot of distributed consensus algorithms have this built into | their requirements. EDIT: I have heard through the grapevine | that something like this might be done for aerospace, but I | have no personal experience with that. | | I agree with parent. The major reason why programming on a | single computer is easier than a distributed system is that | we assume total resilience of various components that we | cannot for a distributed system. | | From the article: | | > This offers hope that it is possible to some day abstract | away the distributed nature of larger-scale systems. | | To do this is not a question of software abstractions, but | hardware resilience. If we have a network which we can | reasonably assume to have 100% uptime and absolutely no | corruption between all its components then we can program | distributed systems as single computers. | catern wrote: | Most distributed consensus algorithms, or distributed | systems in general, are not resilient to nodes producing | arbitrary wrong results. That's the realm of systems like | Bitcoin, which achieve such resilience by paying big | performance costs. | | So it shouldn't be surprising that computers have the same | lack of resilience. | anonymousDan wrote: | Sorry what? That is exactly the purpose of Byzantine | fault tolerant consensus algorithms, which have been | around for many years. | StillBored wrote: | The tandems I listed above, originally used lock stepped | processors, along with stratus/etc. | | edit: Googling yields few results that aren't actual books, | Try this | | https://books.google.com/books?id=wBuy0oLXEuQC&pg=PA218&lpg | =... | dwohnitmok wrote: | Ah well there you go. Had no idea they used lock | stepping! | gnufx wrote: | It doesn't count as resilient in the mainframe sense, but in | an effort to encourage system management, I ran the Node | Health Check system on our "commodity" HPC cluster and found | multiple failed DIMMs and a failed socket no-one had noticed. | (I'd had enough alerts from that on a cluster I managed.) | imtringued wrote: | The article also ignores that e.g. the CUDA API looks nothing | like a local function call. People are explicitly aware when | they are launching GPU kernels. | bee_rider wrote: | You can 'disable' a core in Linux pretty easily, although I'm | not sure to what extent you'd consider this graceful (in the | sense that you write to a system file and then some magic, | which may be arbitrarily complicated I guess, happens in the | background. So it doesn't seem equivalent to just yanking a | core from the package, if that were possible). | aidenn0 wrote: | I think that TFA gets it exactly backwards. It's not that we | will be able to treat multi-node systems as non-distributed | it's that single-nodes will have to start being treated like | distributed systems. | | > The links between the components of your computer are solid | and cannot fail like actual computer network connections. | | I've personally had this disproven to me on multiple occasions. | catern wrote: | >I've personally had this disproven to me on multiple | occasions. | | That sounds like interesting stories! Can you elaborate? | aidenn0 wrote: | Accidents on desktop hardware: | | Multiple bad disk cables (more common in IDE era, but | happened once with SATA). Interestingly enough, Windows | would reduce the drive speed on certain errors, so I had a | drive that booted up in UDMA/133 and the longer it was | running the slower it got, eventually settling in at PIO | mode 2. Switching the drive cable fixed it. | | A sound card that wasn't screwed in to the case, so if you | pushed the phone connector in too hard it would unseat. I | still don't know how that happened; it must have been me | (unless someone pranked me) but the sound-card hadn't been | changed in like 2 years at that point. | | A DIMM wasn't fully clipped in, but the system worked fine | for weeks until someone bumped into the case. | | Things that were actually intentional: | | We expect anything plugged in externally (e.g. USB, | ethernet, HDMI) to be plugged and unplugged without needing | to restart the system. This sounds banal, but wasn't always | the case. I had a network card with 3 interfaces (10BASE5 | AUI, 10BASE2 BNC, 10BASE-T modular plug) and you needed to | power off the system and toggle a DIP switch to change | which was in use. | | I've seen server and minicomputer hardware with | hotpluggable CPUs and RAM | | Eurocard type systems (e.g. VME, cPCI) could connect all | sorts of things, and could run without restarting. This | sort of blurs the line as to what a "node" is. If you have | multiple CPUs on the same PCI bus, is that one node or | many? | | eGPUs have made hotplugging a GPU something that anyone | might do today. If you run this setup, then the majority of | the computational power in your system can appear and | disappear at will, along with multiple GB of RAM. | catern wrote: | >Yes but your computer will not gracefully handle CPUs randomly | failing or RAM randomly failing | | That's incorrect. | | There are plenty of machines/OSs which are (or can be) | resilient to a CPU failing; Linux, for example. From the OS | point of view, you just kill the process that was running on | the CPU at the time and move on. | | Resilience to spontaneous RAM failures is rarer but possible. | bee_rider wrote: | Killing the processes running on the compute element seems | not very graceful, right? I'd expect a gracefully handled | failure to have some state staved from which the computation | can be continued. | | Which would be overkill on a single node, given that CPUs | don't really fail all that often. | catern wrote: | It's up to userspace to do more than that. There are other | issues which can cause processes to be spontaneously killed | (OOMkiller for example) so it's something you should be | tolerant of. | nickelpro wrote: | Disagree. An environment that's being reaped by OOMK is | not stable enough to make assumptions about. You're in | "go down the hall and turn it off and on again" | territory. | | Attempting to account for such environments in user | programs massively inflates their complexity, does little | to enhance reliability, and the resulting behavior is | typically brittle or outright broken from the get go. | | This is why, for example, the C++ committee flirts with | making allocation failure a UB condition. | AdamH12113 wrote: | >[The fact that computers are made of many components separated | by communication buses] suggests that it may be possible to | abstract away the distributed nature of larger-scale systems. | | This is a neat line of thought, but I don't think it can go very | far. There is a huge difference in reliability and predictability | between small-scale and large-scale systems. One way to see this | is to look at power supplies. Two ICs on the same board can be | running off of the same 3.3V supply, and will almost certainly | have a single upstream AC connection to the mains. When thinking | about communications between the ICs, you don't have to consider | power failure because a power failure will take down both ICs. | Compare this to a WiFi network where two devices could be on | separate parts of the power grid! | | Other kinds of failures are rare enough to be ignored completely | for most applications. An Ethernet cable can be unplugged. A PCB | trace can't. | | I used to work with a low-level digital communication protocol | called I2C. It's designed for communication between two chips on | the same board. There is no defined timeout for communication. A | single malfunctioning slave device can hang the entire bus. | According to the official protocol spec, the recommended way of | dealing with this is to reset every device on the bus (which may | mean resetting the entire board). If a hardware reset is not | available, the recommendation is to power-cycle the system! [1] | | Now I2C is a particularly sloppy protocol, and higher-level | versions (SMBus and PMBus) do fix these problems, so this is a | bit of an extreme example. But the fact that I2C is still | commonly used today shows how reliable a small-scale electronic | system can be. Even at the PC level, low-level hardware faults | are rare enough that they're often indicated only by weird | behavior ("My system hangs when the GPU gets hot"), and the | solution is often for the user to guess which component is broken | and replace it. | | [1] Section 3.1.16 of https://www.nxp.com/docs/en/user- | guide/UM10204.pdf | taeric wrote: | So much of programming languages is to hide the distributed | nature of what the computer is doing on a regular basis. This is | somewhat obvious for thread abstractions where you can get two | things happening. It is blatant for CUDA style programming. | | As this link points out, it gets a bit more difficult with some | of the larger machines we have to keep the abstractions useful. | That said, it does mostly work. Despite being able to find and | harp on the areas that it fails, it is amazing how well so many | of the abstractions have held up. | | Would be neat to see explicit handling of what features are | basically completely hiding distributed nature of the computer. | jayd16 wrote: | The abstractions aren't just for simplicity. In many cases, | ensuring that the distributed nature is unknown or unobserved | means the system can make different decisions without affecting | the program. This leaves room for flexibility in the system | design. | taeric wrote: | Distributed problems that are largely timing are easy to see | in this nature. In large, the whole synchronize on a clock | idea is invisible to programmers. | | That said, there are times when it isn't hidden, but only | taken out of your control. I guess the question is mainly in | how to move them to first class objects to reason about? | Karrot_Kream wrote: | Maybe? Alternatively by bringing the distributed nature up | front-and-center you can have more flexible designs. If I | could timeout my drawing routine when the screen has already | refreshed (or context has been stolen from the OS) then I | have a lot more flexibility in how to recover instead of | pretending to do my best and ending up with a lot of screen | tearing when I miss my frame budget. | jayd16 wrote: | I'm trying to wrap my head around where this would happen | in a way that made sense. Derailing the GPU pipeline from | the OS probably doesn't make much sense. If we're talking | about the OS halting the CPU side of the render I guess | that would maybe be useful? Even on a single core machine | it would be equally useful so I don't know if its a case of | distribution per se... | | But in the abstract, sure. It's a give and take. It's | useful to know things and use that knowledge. It's also | useful to know a detail is hidden and changeable without | consequence. | Karrot_Kream wrote: | Yeah I'm thinking the OS halts the CPU side of the render | and, say, stuffs an errno into a register after the | routine so the CPU can see what happened and recover. If | I were writing a program that required a minimum frame | rate and I missed multiple frames, it would probably be | nicer for the user if I displayed a message that I was | just unable to write a frame at the required speed and | quit rather than screen tear and frustrate the user. | | A similar situation happens if my NIC/kernel buffers are | to overloaded to send the packets I need out. Instead I | can try in vain to push packets out and have almost no | understanding how many packets the OS is dropping just to | keep up. Media standards like RTCP were designed around | scenarios like these, but that itself is complexity we | wouldn't need if the OS could notify the application when | their packet writes failed. | | This kind of flexibility right now is really difficult | because most OSs try to pretend as hard as possible that | everything happens sequentially. This is just about | opening up more complete abstractions to the programmer. | zozbot234 wrote: | The distributed nature can never be _unobserved_ , by | definition. What a well-designed distributed system can do is | offer facilities to enable useful constraints on its | operation, that might then be used as necessary via a | programming language. | [deleted] | Koshkin wrote: | Yes, and concurrency is, in fact, an implementation detail. Which | is why I think that in most _applied_ scenarios it should be | hidden, and taken care of, by the compiler. | it wrote: | The Erlang VM (BEAM) can be viewed as a distributed operating | system, or at least the beginnings of one. | simne wrote: | Agree, and could add, that ALL Erlang flavors (exists at least | 4 independent implementations for different environments and | for different targets) are distributed. | | And Erlang is based on relatively new syntax from Prolog, which | also have cool ideas. | tonymet wrote: | i recommend people model their apps this way. spin up more | threads than needed, one each for api , DB , LB, async, pipelines | etc. you can model an entire stack in one memory space. It's a | great way to prototype your complete data model before scaling to | the proper solutions. Lots of design constraints are found this | way . everything looks great on paper but then falls apart when | integrating layers. | bumblebritches5 wrote: | simne wrote: | Unfortunately, this idea fights vs idea of least responsibility. | | Because, user level programs are all at one level of abstraction, | and this distribution is distributed over many levels of | abstraction. | | So in desktop systems, mean mostly successors of business micro | machines, access to other levels of abstraction intentionally | hardened for measures of security and reliability. The same thing | applied to crowd computing - there also vps's are isolated from | hardware and from other vps's. | | These measures usually avoided in game systems and in embedded | systems, but they are not allowed to run multiple programs from | independent developers (for security and reliability), and their | programming magnitudes more expensive than desktops and even | server side (yes, you may surprised, but game consoles software | in many case more reliable than military, and usually far surpass | business software). | | To solve this contradiction, need some totally new paradigms and | technologies, may be some revolutionary, like usage of GAI to | write code. ___________________________________________________________________ (page generated 2022-03-30 23:00 UTC)