[HN Gopher] Teleforking a Process onto a Different Computer ___________________________________________________________________ Teleforking a Process onto a Different Computer Author : trishume Score : 365 points Date : 2020-04-26 15:19 UTC (7 hours ago) (HTM) web link (thume.ca) (TXT) w3m dump (thume.ca) | crashdelta wrote: | THIS IS REVOLUTIONARY! | anthk wrote: | What I'd love it's bingding easily remote directories as local. | Not NFS, but a braindead 9p. If I don't have a tool, I'd love to | have a bind-mount of a directory from a stranger, and run a | binarye from within (or piping it) without he being able to trace | the I/O. | | If the remote FS is a diff arch, I'd should be able to run the | same binary remotely as a fallback option, seamless. | crashdelta wrote: | This is one of the best side projects I've ever seen, hands down. | trishume wrote: | <3 | zozbot234 wrote: | The page states that CRIU requires kernel patches, but other | sources say that the kernel code for CRIU is already in the | mainline kernel. What's up with that? | trishume wrote: | I didn't say it requires kernel patches, just that it requires | a build flag and new enough versions. It does happen that most | distros do enable that build flag though. | | There is a patch set for write protection support in | userfaultfd that was just merged recently and I'm not sure if | it's made it into any actual releases yet. | cyphar wrote: | CRIU does work on mainline kernels and has for several years. | However, it is still being actively developed (userfaultfd is | the example the article uses -- a mainline feature which is | still seeing lots of development). | | It doesn't usually require additional patches to work, though | they do have a fairly big job (every new kernel feature needs | to be added to CRIU so it can both checkpoint and restore it -- | and not all features are trivial to checkpoint). It's entirely | possible that programs using very new kernel features may | struggle to get CRIU to work. | hawski wrote: | Seriously cool. That also reminds me of DragonFlyBSD's process | checkpointing feature that offers suspend to disk. In Linux world | there were many attempts, but AFAIK nothing simple and complete | enough. To be fair I don't know if DF's implementation is that | either. | | https://www.dragonflybsd.org/cgi/web-man?command=sys_checkpo... | | https://www.dragonflybsd.org/cgi/web-man?command=checkpoint&... | jabedude wrote: | This article[0] on LWN seems to suggest that Linux has no | kernel support for checkpoint/restore because it's hard to | implement (no arguments there). But hypervisors support | checkpoint/restore for virtual machines, e.g. ESXi VMotion and | KVM live migration, so it seems like these technical problems | are solvable. Indeed all the benefits of VM migration seem to | also apply to process migration (load balancing, service | uptime, etc). | | 0. https://lwn.net/Articles/293575/ | dirtydroog wrote: | They seem to be different problems though, as that LWN | article suggests. It's probably easier to checkpoint/restore | an entire image's state than an individual process' state. | | But even calls like gettimeofday() work differently when | running on hypervisors than they do when running on bare | metal | colinchartier wrote: | The hypervisor migration features you mention are really | powerful and stable. I founded a startup that does fast CI | using them. | | This is what it looks like: | https://layerci.com/jobs/github/layerdemo/layer-gogs-demo/18 | | (in this run it restored a previous VM snapshot to skip the | setup, and took a snapshot after the webserver started to | give a lazy-loaded staging server) | colinchartier wrote: | There's also a project called CRIU that is used | experimentally by Docker for container save/load: | https://www.criu.org/Main_Page | jraph wrote: | You were faster than me. Yes, CRIU does checkpointing in | userspace on Linux. | | They contributed a lot of patches to the Linux kernel to | support this feature. So there isn't any support for | checkpointing in the mainline kernel. Rather, there are | many kernel features that allows building such a feature in | userspace. For instance, you can set the PID of the next | program you spawn on Linux thanks to CRIU, because CRIU | needs to be able to restore processes with the same PIDs as | when they were checkpointed [1]. | | CRIU is used by OpenVZ, which, if I remember correctly, are | moving or moved away from a kernel-based approach. | | [1] http://efiop-notes.blogspot.com/2014/06/how-to-set-pid- | using... | jabedude wrote: | This was immensely informative, thank you for that link. | CRIU looks like an awesome project. I can't believe I | haven't heard of them prior to today. | emmelaich wrote: | CRIU and DMTCP are mentioned in the article. | [deleted] | justapassenger wrote: | IMO, big reason for why Linux doesn't support it, is that big | tech companies, who drive a lot of development and funding, | found that it's just not worth it. | | Tech moved away from mainframe model of keeping single | processes up for as long as possible and isolating software | from the failures. Instead they embrace failure at the | software layer, which gives you (when executed correctly) | both same high level of availability, but also layers of | protection against other issues (as node going down in | expected way is the same, no matter what's the root cause) | and makes overall maintainability and upgradability systems | higher. | | Basically, why deal with complicated checkpoints at kernel | level, that doesn't understand your software, when you ca | instead deal with that at the software layer, with full | control over it. | spullara wrote: | Swap performance has been similarly underinvested. | justapassenger wrote: | Yeah, and for exactly the same reasons. | | In old days swap was necessity to keep system up, but | nowadays you need it only if your dataset is really big | (other than some weirdness in kernel, that actually like | to have some small amount of swap to do cleanup and | maintenance) | | And if you have that problem, you have custom solutions | that implement memory hierarchy for you, that give you | much better result, as they can work across all the | layers (cache/hdd/sdd/net/etc) in much more effective way | than generic, software agnostic solutions focused just on | one dimension. | snazz wrote: | How commonly do people need to use swap in the first | place? On my current computer, I don't even have a swap | partition because I've never gotten close to filling my | 16GB, even with a big virtual machine and hundreds of | browser tabs. In my experience on desktop systems, it | just makes your computer painfully slow when you write a | program that allocates far too much memory. Without swap, | the OOM killer gets involved much earlier and you don't | have to do a hard power cut to stop the disk from | thrashing. | cesarb wrote: | I just opened the site of the biggest retailer in my | country, and looked at one entry-level laptop: it has 2GB | of RAM. Not everybody has a high-end computer. | saalweachter wrote: | What's hilarious to me, growing up in the era of swap, is | that now you can have so much RAM that you can start | caching your disk to RAM for a little extra performance | (well, we've kind of moved past that, too, since SSDs are | so much faster than spinny disks). | haneefmubarak wrote: | When done well, swap can be pretty useful. I have a Fall | 2019 MacBook Pro maxed out on CPU (8 cores) and RAM (64 | GB) - MacOS still regularly compresses (highest I've seen | so far: 12 GB) and/or swaps (highest I've seen so far: 16 | GB) relatively unused memory to make more space for other | applications and hot caches (which are still useful even | with an PCIe SSD). | | Keep in mind that everything feels fully snappy and | responsive while this is all happening. | barrkel wrote: | If you don't have enough memory for a single app, then | your life is completely miserable - in that case, | swapping is almost indistinguishable from crashed. But | swap can be the difference between being able to run one | app at a time, vs run three or four, as long as you only | have one in the foreground at a time. | | Swap doesn't play well with GC, or pointerful programming | languages generally. Swap is a latency hit every time you | touch an evicted page, and the more pointers your | structures have, the more frequently that occurs. | | In older times, when memory needed to be conserved, GC | was less frequently used, and as a consequence of the | accounting overheads of manual memory allocation, | pointerful data structures weren't quite as common. | Instead, you'd have pre-allocated arrays which supported | up to N items of a particular type (and tough luck if you | wanted more), or in more sophisticated applications, | arena allocators. In such systems, the working set of | memory is more likely to be contiguous; paging works | better because every page load is more likely to bring in | relevant data. | | Switching between two open applications was the moment | where you'd typically hit swap; that, and every now and | then when you touched an area of the app you hadn't in a | while, or jumped to a distant area of the document you | were working on. The HDD light would come on, and you'd | have to wait a few seconds while the computer ground | through (HDDs being audible, some much louder than | others) before you'd switch. | | At some point in the mid 2000s, GC languages tipped the | balance, especially JS. The browser became the main | application, and wouldn't tolerate swap well. It became a | mini OS unto itself, and had to balance the competing | needs of different tabs. Then mobile exploded, and didn't | use swap at all in order to compete with the iPhone, | remarkable for its smooth animations. If there was to be | latency, the application had to hide it; OS-level | introduced latency on a page fault wasn't good enough, | because if it happened on the UI thread, you dropped | frames. | | And of course swap never worked well on servers with | frequent requests. Swapping on a server is a great way to | back up all your queues. | saagarjha wrote: | It's touched on at the very end, but this kind of work is | somewhat similar to what the kernel needs to do on a fork or | context switch, so you can really figure out what state you need | to keep track of from there. Once you have that, scheduling one | of these network processes isn't really all that different than | scheduling a normal process, except the of course syscalls on the | remote machine will possibly go to a kernel that doesn't know | what to do with them. | new_realist wrote: | See https://criu.org/Live_migration | abotsis wrote: | Also of interest might be Sprite- a Berkeley research os | developed "back in the day" by Ken Shirriff And others. It | boasted a lot of innovations like a logging filesystem (not just | metadata) and a distributed process model and filesystem allowing | for live migrations between nodes. | https://www2.eecs.berkeley.edu/Research/Projects/CS/sprite/s... | jka wrote: | This reminds me a little bit of the idea of 'Single System | Image'[1] computing. | | The idea, in abstract, is that you login to an environment where | you can list running processes, perform filesystem I/O, list and | create network connections, etc -- and any and all of these are | in fact running across a cluster of distributed machines. | | (in a trivial case that cluster might be a single machine, in | which case it's essentially no different to logging in to a | standalone server) | | The wikipedia page referenced has a good description and a list | of implementations; sadly the set of {has-recent-release && is- | open-source && supports-process-migration} seems empty. | | [1] - https://en.wikipedia.org/wiki/Single_system_image | macintux wrote: | That was the original concept that led Ian Murdock and John | Hartman to found Progeny. The idea was that overnight, while no | one was working at their desks, companies could reboot their | Windows boxes into a SSI network of Linux nodes to run parallel | compute tasks. | | Roughly, anyway, I got the sales pitch 20 years ago so my | memories are fuzzy. I wasn't remotely sold on it but was so | anxious to work for a Linux R&D company in Indianapolis of all | places that I accepted the job anyway. | | Sadly we didn't get far on the concept before the dot-com | crash. Absent more venture capital we pivoted to focus on | something we could sell, Progeny Linux, and tried to turn that | into a managed platform for companies who wanted to run Linux | on their appliances. | ISL wrote: | What's old is new again -- I'm pretty sure QNX could do this in | the 1990s. | | QNX had a really cool way of doing inter-process communication | over the LAN that worked as if it were local. Used it in my first | lab job in 2001. You might not find it on the web, though. The | API references were all (thick!) dead trees. | | Edit: Looks like QNX4 couldn't fork over the LAN. It had a | separate "spawn()" call that could operate across nodes. | | https://www.qnx.com/developers/docs/qnx_4.25_docs/qnx4/sysar... | Proven wrote: | +1. I was going to mention this (less eloquently) | checker659 wrote: | > I'm pretty sure QNX could do this in the 1990s. | | Plan 9, SmallTalk | imglorp wrote: | Erlang of course. All spawns and messages can be local or | remote. | masklinn wrote: | Indeed, weird to only find it that low as a sub-comment: | spawn(Node, Module, Function, Args) -> pid() | Returns the process identifier (pid) of a new process | started by the application of Module:Function to Args on | Node. If Node does not exist, a useless pid is returned. | Otherwise works like spawn/3. | russellbeattie wrote: | Any time I see something about remote processes, I | immediately think, "Erlang could probably do this." | | I think Erlang would have been the programming language of | the 21st century... If only the syntax wasn't like line | noise and a printer error code had a baby, and raised it to | think like Lisp. | femto113 wrote: | Even further back this echoes some of the goals of MIT's | Project Athena started in 1980s. | lachlan-sneff wrote: | Wow, this is really interesting. I bet that there's a way of | doing this robustly by streaming wasm modules instead of full | executables to every server in the cluster. | peterkelly wrote: | There's been a bunch of interesting work done on this over the | years. Here's a literature survey on the topic: | https://dl.acm.org/doi/abs/10.1145/367701.367728 | pcr910303 wrote: | This makes me think about Urbit[0] OS -- Urbit OS represents the | entire OS state in a simple tree, this would be very simple to | implement. | | [0]: https://urbit.org/ | [deleted] | userbinator wrote: | _This can let you stream in new pages of memory only as they are | accessed by the program, allowing you to teleport processes with | lower latency since they can start running basically right away._ | | That's what "live migration" does; it can be done with an entire | VM: https://en.wikipedia.org/wiki/Live_migration | dmitrygr wrote: | Hey what's the best way to contact you? Have a question. If | possible my email is in my profile. | sitkack wrote: | https://www.usenix.org/node/170864 | trishume wrote: | It's what a certain kind of live migration strategy does. | There's a discussion thread about this further down where I | link to that same page. | sitkack wrote: | https://cloud.google.com/compute/docs/instances/live-migrati... | gives a great rundown on the mechanics as they pertain to GCP. | Is probably somewhat valid for other systems. | YesThatTom2 wrote: | Condor did this in the early 90s. | synack wrote: | This reminds me of OpenMOSIX, which implemented a good chunk of | POSIX in a distributed fashion. | | MPI also comes to mind, but it's more focused on the IPC | mechanisms. | | I always liked Plan 9's approach, where every CPU is just a file | and you execute code by writing to that file, even if it's on a | remote filesystem. | anfractuosity wrote: | I recall OpenMosix too, I remember playing with it on a couple | of old machines a while ago. I thought it seemed quite a cool | idea. I think I was trying to do mp3 encoding from CDs with it | in a distributed fashion. | adrianpike wrote: | Whoa, I was just talking about OpenMOSIX the other day - at my | college job, when we retired workstations, they would just sit | in the back closet for years until facilities would get around | to getting them up for auction. We set up a TFTP boot setup and | had a few dozen nodes at any given time. It wasn't high | performance by any fashion, but it worked pretty transparently | and was always fun to throw heavy synthetic workloads at it and | watch the cluster rebalance and hear the fans spin up on the | clunky old pentium's. | dghughes wrote: | Funny almost exactly one year ago I was desperately trying to | learn about and demo a Linux cluster. | | I happened upon ClusterKnoppix and used that LiveCD as my demo | which uses OpenMOSIX. And MPI as well but I was having a lot of | trouble getting it all in my head and having to explain it | during a presentation. | | I'm glad that over with! But it was still fun. | AstroJetson wrote: | Came here to write about ClusterKnoppix! It was amazing and I | had 7 small form factor PC's running a mini cluster. It was | great for moving things around. Problem was it fell behind in | releases and it became hard to get applications to run on it. | Knoppix was how I got into Linux, it still ranks as my | favorite distro. Thanks Klaus Knopper for your work in | setting that up! | | It would be nice if it worked on Raspberry Pi, or if there | was a simple way to set OpenMOSIX up on Pi's | AstroJetson wrote: | For people that want to try OpenMOSIX out, take a look at this | site http://dirk.eddelbuettel.com/quantian.html He has a distro | that is called Quantian, with a big collection of science tools | added. Shame it's Sunday, I'll need to wait a week to pull it | down and see how well it flys. | synack wrote: | Ah, the Plan 9 bits I was remembering are actually in | Inferno... http://www.vitanuova.com/inferno/man/4/grid-cpu.html | notacoward wrote: | Same reaction here. MOSIX was basically this idea developed for | real. Turns out there are a bunch of secondary problems to do | with namespaces and security, scheduling and resource use, | topology, performance, and so on. In the end it never turned | out to be all that practical, and I know of several efforts | (e.g. LOCUS) that went quite far to reach the same conclusion. | There's a reason that non-transparent "shared nothing" | distributed computing came to dominate the computing landscape. | carapace wrote: | "Somebody else has had this problem." | | Don't get me wrong, this is great hacking and great fun. And this | is a good point: | | > I think this stuff is really cool because it's an instance of | one of my favourite techniques, which is diving in to find a | lesser-known layer of abstraction that makes something that seems | nigh-impossible actually not that much work. Teleporting a | computation may seem impossible, or like it would require | techniques like serializing all your state, copying a binary | executable to the remote machine, and running it there with | special command line flags to reload the state. | Animats wrote: | That goes back to the 1980s, with UCLA Locus. This was a | distributed UNIX-like system. You could launch a process on | another machine and keep I/O and pipes connected. Even on a | machine with a different CPU architecture. They even shared file | position between tasks across the network. Locus was eventually | part of an IBM product. | | A big part of the problem is "fork", which is a primitive | designed to work on a PDP-11 with very limited memory. The way | "fork" originally worked was to swap out the process, and instead | of discarding the in-memory copy, duplicate the process table | entry for it, making the swapped-out version and the in-memory | version separate processes. This copied code, data, and the | process header with the file info. This is a strange way to | launch a new process, but it was really easy to implement in | early Unix. | | Most other systems had some variant on "run" - launch and run the | indicated image. That distributes much better. | wrs wrote: | No need to use the past tense -- CreateProcess is the primitive | in Windows NT. (It's been 20 years...can we just call that | Windows now?) | zozbot234 wrote: | posix_spawn is a thing as well. | barrkel wrote: | There's also an ergonomics to process creation APIs - rather | than needing separate APIs for manipulating your child process | vs manipulating your own process, fork() lets you use one to | implement the other: fork(), configure the resulting process, | then exec(). | | CreateProcess* on Windows is a relative monstrosity of | complexity compared to fork/exec. | peterwwillis wrote: | It's nice to see people re-discover old school tech. In cluster | computing this was generally called "application | checkpointing"[1] and it's still in use in many different systems | today. If you want to build this into your app for parallel | computing you'd typically use PVM[2]/MPI[3]. SSI[4] clusters | tried to simplify all this by making any process "telefork" and | run on any node (based on a load balancing algorithm), but the | most persistent and difficult challenge was getting shared memory | and threading to work reliably. | | It looks like CRIU support is bundled in kernels since 3.11[5], | and works for me in Ubuntu 18.04, so you can basically do this | now without custom apps. | | [1] https://en.wikipedia.org/wiki/Application_checkpointing [2] | https://en.wikipedia.org/wiki/Parallel_Virtual_Machine [3] | https://en.wikipedia.org/wiki/Message_Passing_Interface [4] | https://en.wikipedia.org/wiki/Single_system_image [5] | https://en.wikipedia.org/wiki/CRIU#Use | dekhn wrote: | Condor, a distributed computing environment, has done IO remoting | (where all calls to IO on the target machine get sent back to the | source) for several decades. The origin of Linux Containers was | process migration. | | I believe people have found other ways to do this, personally I | think the ECS model (like k8s, but the cloud provider hosts the | k8s environment) where the user packages up all the dependencies | and clearly specifies the IO mechanisms through late biniding, | makes a lot more sense for distributed computing. | gmfawcett wrote: | For those like me who haven't heard of Condor before, it's | called HTCondor now: | | https://research.cs.wisc.edu/htcondor/description.html | vidarh wrote: | I clicked through to mention Condor too... I first came across | it in the 90's, and it seems like one of those obvious hacks | that keeps being reinvented. | dekhn wrote: | I was actually channeling the creator of Condor, Miron Livny, | who has a history of going to talks about distributed | computing and pointing out that "Condor already does that" | for nearly everything that people try to tout as new and | cool. | | He's not wrong, but few people use Condor. | mempko wrote: | We use Condor. Oh and did you know it can run docker | containers too? And it's constantly being updated an | improved. What is lacking is in my opinion is a cool GUI to | monitor and spawn things. | iaresee wrote: | Pre-MS acquisition you used to be able to use Cycle | Cloud's software for ~free without support. Unfortunately | it went all-Azure-only post-acquisition. | iaresee wrote: | Few people outside academia, maybe? But inside it still | seems to dominate in areas like physics computation. CERN | uses a world wide Condor grid. LIGO too. It's excellent for | sharing cycles for those slow-burn, highly parallel, | massive data scale problems. | | I spent more than a decade bringing Condor to semi- | conductor, financial and biomedical institutions. It was | always a fight to show them there was a better way to | utilize their massive server farms that didn't require | paying the LSF Tax. Without a shiny sales or marketing | department, Condor was hard to pitch to IT departments. | | Still, to this day, I see people doing things with "modern" | platforms like Kubernetes and such and I chuckle. Had that | in Condor 15 years ago in many cases. :) | eternauta3k wrote: | Why "LSF tax"? | gautamcgoel wrote: | I think he's referring to IBM LSF: https://www.ibm.com/su | pport/knowledgecenter/en/SSWRJV/produc... | iaresee wrote: | Yes, this exactly. | | If you want to dig deep into computer science history, | you'll note that LSF was once a slightly modified Condor. | There's a wild history between the two and U.Wisc and U | of T. | 0xdeadbeefbabe wrote: | > Still, to this day, I see people doing things with | "modern" platforms like Kubernetes and such and I | chuckle. Had that in Condor 15 years ago in many cases. | :) | | I'm reading the docs and it seems used mostly for solving | long running math problems like protein folding or seti | at home? | | Can it be used for scaling a website too? I think that's | k8s "killer" feature heh. | iaresee wrote: | Yes, it can scale based on metrics. And metrics can be | anything. | | What it's missing is all the discovery and network | plumbing to tie running instances together with load | balancing and inter-service comms. | | Googles old Borg paper mentions Condor as a thing they | considered and cribbed features from. | | Honestly, serving a website is not as different from | batch processing problems as you'd think. There are | differences but they're subtle, not mountainous. | yummypaint wrote: | I've used condor for ~5 years now, mostly for running | simulations and processing data. Everything i've done | with it has been trivially parallelizable (divide data | into chunks based on time, etc), and in those | applications it has been a superb tool that just works. | | It should be possible to run a scalable website with it, | but then you don't get "infinite" scalability like cloud | services offer, since you're limited by the size of the | compute pool. It would probably have its pitfalls. | | That being said, coming in without knowledge of either, i | found it much easier to learn and get started doing | things with condor than kubernetes. I had all kinds of | issues just getting simple things like LaTeX compilation | as part of gitlab CI to work reliably. Clearly the | experts know how to make things go, but condor is lower | barrier to entry. For use cases where condor CAN work, | especially data processing, i always recommend that. | dekhn wrote: | Most systems like Condor have a concept that a job or | task is something that "comes up, runs for a while, | writes to log files, and then exits on its own, or after | a deadline". I've talked to the various batch queue | providers and I don't think they really consider | "services" (like a webserver, app server, rpc server, or | whatever) in their purview. | | In fact, that was what struck me the most when I started | at Google (a looong time ago): at first I thought of borg | like a batch queue system, but really it's more of a | "Service-keeper-upper" that does resource management and | batch jobs are just sort one type of job (webserving, etc | are examples of "Service jobs") laid on top of the RM | system. | | Over time I've really come to prefer the google approach, | for example when a batch job is running, it still listens | on a web port, serves up a status page, and numerous | other introspection pages that are great for debugging. | | TBH I haven't read the Condor, PBS, LSF manuals in a | while so it's very well possible they handle service jobs | and the associated problems like dynamic port management, | task discovery, RPC balancing, etc. | iaresee wrote: | But in a world where you're continuously deploying on a | cadence that's incredibly quick, how do things differ? I | contend the batch and online worlds start to get pretty | blurry at this stage. We're not in a world where bragging | about uptime on our webserver in years is a thing any | more. | | I was routinely using Condor in semi-conductor R&D and | running batches of jobs where each job was running for | many days -- that's probably far longer than any single | instance of a service exists at Google in this day and | age, right? | | None of the batch stuff does the networking management | though. No port mapping, no service discovery | registration, no load balancer integration, etc. That's | Kubernetes sugar they lack. But...has never struck me as | overly hard to add, especially if you use Condor's Docker | runner facilities. | | Edit: I should say that I don't _really_ think you could | swap out Kubernetes for Condor. Not easily. But it's | always been in my long list of weekend projects to see | what running an cluster of online services would be like | on Condor. I don't think it'd be awful or all that hard. | | The other killer Condor tech is their database | technology. the double-evaluation approach of ClassAd is | so fantastic for non-homogenous environments. Where loads | have needs and computational nodes have needs and | everyone can end up mostly happy. | concernedctzn wrote: | side note: take a look at this guy's other blog posts, they're | all very good | cecilpl2 wrote: | This is similar to what Incredibuild does. It distributes compile | and compute jobs across a network, effectively sandboxing the | remote process and forwarding all filesystem calls back to the | initiating agent. | touisteur wrote: | I wonder whether the effort by the syzkaller people (@dyukov) | could help with the actual description of all the syscalls (that | the author says people gave up on for now, because too complex), | since they need them to be able to fuzz efficiently... | vladbb wrote: | I implemented something similar ten years ago for a class | project: https://youtu.be/0am-5noTrWk | fitzn wrote: | Really cool idea! Thanks for providing so much detail in the | post. I enjoyed it. | | A somewhat related project is the PIOS operating system written | 10 years ago but still used today to teach the operating systems | class there. The OS has different goals than your project but it | does support forking processes to different machines and then | deterministically merging their results back into the parent | process. Your post remind me of it. There's a handful of papers | that talks about the different things they did with the OS, as | well as their best paper award at OSDI 2010. | | https://dedis.cs.yale.edu/2010/det/ | [deleted] | cjbprime wrote: | Amazing work. | ignoramous wrote: | Yep. Reminds me of u/xal praising u/trishume: | https://news.ycombinator.com/item?id=20490964 | londons_explore wrote: | Bonus points if you can effectively implement the "copy on write" | ability of the linux kernel to only send over pages to the remote | machine that are changed either in the local or remote fork, or | read in the remote fork. | | A rsync-like diff algorithm might also substantially reduce | copied pages if the same or a similar process is teleforked | multiple times. | | Many processes have a lot of memory which is never read or | written, and there's no reason that should be moved, or at least | no reason it should be moved quickly. | | Using that, you ought to be able to resume the remote fork in | milliseconds rather than seconds. | | userfaultfd() or mapping everything to files on a FUSE filesystem | both look like promising implementation options. | trishume wrote: | I do in fact mention this idea in the article. In fact | userfaultfd was added to the kernel so that CRIU and KVM live | migration could implement exactly this. | | Another cool project that does something like this is | https://github.com/gamozolabs/chocolate_milk which is a fuzzing | hypervisor kernel which can back a VM snapshot memory mapping | over the network to only pull down the pages that the VM | actually reads during the fuzz case. | mlyle wrote: | If you just pull things on demand, you're going to get a lot of | round-trip-time penalties to page things in. | | I think you should still be pushing the memory as fast as you | can, but maybe you start the child while this is still in | progress, and prioritize sending stuff the child asks for | (reorder to send that stuff "next"), if you've not already sent | it. | lunixbochs wrote: | That's a great idea. One of my thoughts was to "pre-heat" the | process by executing a bit locally with side effects disabled | to see what would get immediately accessed and send that | first. | | If your systems strictly match somehow (machine image with | auto update disabled? or regularly hash and timestamp files | on both systems) you can also cheat by mapping some of the | files locally on the other side. | trishume wrote: | Yah that is indeed a super important optimization for | avoiding round trips. CRIU does this and calls it "pre- | paging", their wiki also mentions that they adapt their page | streaming to try to pre-stream pages around pages that have | been faulted: | https://en.wikipedia.org/wiki/Live_migration#Post- | copy_memor... | | edit: lol I didn't realized that isn't CRIU's wiki since they | just linked to a Wikipedia page and both use WikiMedia | software. This is the actual CRIU wiki page, and it's way | harder to tell if they do this, although I suspect they do | and it's in the "copy images" step of the diagram | https://criu.org/Userfaultfd | [deleted] | touisteur wrote: | If you're at the hypervisor level you can also use Intel PML, | seems it's made for this. https://arxiv.org/abs/2001.09991 | | I'm guessing @gamozolabs (twitter) is going to use it some day | soon to be fuzz billions of cases per second, with snapshot | fuzzing... | anticensor wrote: | hoard() would be a better name. | jabedude wrote: | I don't understand, could you explain? `tfork(2)` seems more in | line with Linux naming conventions. | anticensor wrote: | Telefork as a verb does not map to a real word concept. Hoard | fits better, as in hoarding another computer for your | process. | jabedude wrote: | That is funny. hoard() and steal() could work. | anticensor wrote: | Agree, steal() instead of telepad() would be a good idea. | carapace wrote: | Ha! I thought you had misspelled _horde!_ ;-) | buckminster wrote: | I think you mishurd. | dreamcompiler wrote: | Telescript [0] is based on this idea, although at a higher level. | I wish we could just build Actor-based operating systems and then | we wouldn't need to keep reinventing flexible distributed | computation, but alas...[1] | | [0] | https://en.wikipedia.org/wiki/Telescript_(programming_langua... | | [1] Yes I know Erlang exists. I wish more people would use it. ___________________________________________________________________ (page generated 2020-04-26 23:00 UTC)