[HN Gopher] Linear Address Spaces: Unsafe at any speed ___________________________________________________________________ Linear Address Spaces: Unsafe at any speed Author : gbrown_ Score : 115 points Date : 2022-06-29 19:45 UTC (3 hours ago) (HTM) web link (queue.acm.org) (TXT) w3m dump (queue.acm.org) | Veserv wrote: | Of course things would be faster if we did away with coarse | grained virtual memory protection and instead merged everything | into a single address space and guaranteed protection using fine | grained permission mechanisms. | | The problem with that is that a single error in the fine grained | mechanism anywhere in the entire system can quite easily cause | complete system compromise. To achieve any safety guarantees | requires achieving perfect safety guarantees across all arbitrary | code in your entire deployed system. This is astronomically | harder than ensuring safety guarantees using virtual memory | protection where you only need to analyze the small trusted code | base establishing the linear address space and do not need to be | able to analyze or even understand arbitrary code to enforce | safety and separation. | | For that matter, fine grained permissions are a strict superset | of the prevailing virtual memory paradigm as you can trivially | model the existing coarse grained protection by just making the | fine grained protection more coarse. So, if you can make a safe | system using fine grained permissions then you can trivially | create a safe system using coarse grained virtual memory | protection. And, if you can do that then you can create a | unhackable operating system right now using those techniques. So | where is it? | | Anybody who claims to be able to solve this problem should first | start by demonstrating a mathematically proven unhackable | operating system as that is _strictly easier_ than what is being | proposed. Until they do that, the entire idea is a total | pipedream with respect to multi-tenant systems. | VogonPoetry wrote: | I think that the plague of speculative execution bugs qualify | as a single error in virtual memory systems that cause complete | system compromise. This was not a logic error in code, but a | flaw in the hardware. It isn't clear to me if CHERI would have | been immune to speculative execution problems, but access | issues would likely have shown up if the memory ownership tests | were in the wrong place. | | I have been following CHERI. I note that in order to create the | first FPGA implementation they had to first define the HDL for | a virtual memory system -- all of the research "processor" | models that were available did not have working / complete VM | implementations. CHERI doesn't replace VM, it is in addition to | having VM. | | I've found that memory bugs (including virtual memory ones) are | difficult to debug, because the error is almost never in the | place where the failures show up and there is no easy way to | track back who ought to own the object or how long ago the | error happened. CHERI can help with this by at least being able | to identify the owner. | | Virtual memory systems are usually pretty complex. Take a look | at the list of issues for the design of L3 | <https://pdos.csail.mit.edu/6.828/2007/lec/l3.html>. The | largest section there is for creating address spaces. For the | Linux kernel, in this diagram a lot of the MM code is colored | green <https://i.stack.imgur.com/1dyzH.png>, it is a | significant portion. More code means more bugs and much harder | to formally verify. | | I am not convinced by the argument that it is possible to take | a fine grained system and trivially expand it to a coarse | grained system. How is shared memory handled, mmap'ed dylibs, | page level copy-on-write? | [deleted] | potatoalienof13 wrote: | You have misunderstood the article. It is not advocating for | the return to single address space systems. It is advocating | for potential alternatives to the linear address space model. | Here [1] is an operating system that I think fits under the | description of what you were talking about | | https://en.wikipedia.org/wiki/Singularity_%28operating_syste... | [1] | Genbox wrote: | The more I research Singularity, the more I like it. I deep | dived into all the design docs years ago and the amount of | rethinking existing OS infrastructure is astounding. | | Joe Duffy has some great blog posts on Midori (OS based on | Singularity) here: | http://joeduffyblog.com/2015/11/03/blogging-about-midori/ | infogulch wrote: | The Mill's memory model is one of its most interesting features | IMO [1] and solves some of the same problems, but by going the | other way. | | On the Mill the whole processor bank uses a global virtual | address space. TLB and mapping to physical memory happens at the | _memory controller_. Everything above the memory controller is in | the same virtual address space, including L1-L3+ caches. This | solves _a lot_ of problems, for example: If you go out to main | memory you 're already paying ~300 cycles of latency, so having a | large silicon area / data structure for translation is no longer | a 1-cycle latency problem. Writes to main memory are flushed down | the same memory hierarchy that reads come from and succeed as | soon as they hit L1. Since all cache lines are in the same | virtual address space you don't have to track and synchronize | reads and writes across translation zones within the cache | hierarchy. When you request an unallocated page you get the whole | pre-zeroed page back _instantly_ , since it doesn't need to be | mapped to physical pages until writes are flushed out of L3. This | means its possible for a page to be allocated, written to, read, | and deallocated which _never actually touches physical memory_ | throughout the whole sequence and the whole workload is served | purely within the cache hierarchy. | | Protection is a separate system ("PLB") and can be much smaller | and more streamlined since it's not trying to do two jobs at | once. The PLB allows processes to give fine-grained temporary | access of a portion of its memory to another process; RW, Ro, Wo, | byte-addressed ranges, for one call or longer etc. Processes get | allocated available address space on start, they can't just | assume they own the whole address space or start at some specific | address (you should be using ASLR anyways so this should have no | effect on well-formed programs, though there is a legacy | fallback). | | [1]: My previous comment: | https://news.ycombinator.com/item?id=27952660 | pclmulqdq wrote: | The Mill model is kind of cool, but today, many peripherals | (including GPUs and NICs) have the ability to dump bytes | straight into L3 cache. This improves latency in a lot of | tasks, including the server-side ones that the Mill is designed | for. This is possible due to the fact that MMUs are above the | L3 cache. | | Honestly, I'm happy waiting for 4k pages to die and be replaced | by huge pages. Page tables were added to the x86 architecture | in 1985, when 1MB of memory was a ton of memory to have. Having | 256 pages worth of memory in your computer was weird and | exotic. Fast forward to today, and the average user has several | GB of memory - mainstream computers can be expanded to over 128 | GB today - and we still mainly use 4k pages. That is the | problem here. If we could swap to 2M pages in most | applications, we would be able to reduce page table sizes by a | factor of 512, and they would still be a lot larger than page | tables when virtual memory was invented. And we wouldn't waste | much memory! | | But no, 4k pages for backwards compatibility. 4k pages forever. | And while we're at it, let's add features to Linux (like TCP | zero copy) that rely on having 4k pages. | a-dub wrote: | > Why do we even have linear physical and virtual addresses in | the first place, when pretty much everything today is object- | oriented? | | are there alternatives to linearly growing call stacks? | robotresearcher wrote: | A stack is a list of objects with a LIFO interface. Doesn't | have to be a contiguous byte sequence. | a-dub wrote: | is there an example of machine code that doesn't make use of | a linear contiguous call stack? | | what would the alternative be? compute the size of all the | stack frames a-priori in the compiler and then spray them all | over main memory and then maintain a linear contiguous list | of addresses? doesn't the linear contiguous nature of | function call stacks in machine code preserve locality in | order to make more efficient use of caches? or would the | caches have to become smarter in order to know to preserve | "nearby" stack frames when possible? | | also, why not just make the addresses wider and put the pid | in the high bits? they're already doing this masking stuff | for the security descriptors, why not just throw the pid in | there as well and be done with it? | robotresearcher wrote: | The linked article doesn't mention call stacks explicitly, | but describes the R1000 arch was object+offset addressed in | HW. So unless they restricted the call stack to fit into | one object and use only offsets, then yes, they must have | chained objects together for the stack. | | When you have a page-based memory model, you've created the | importance of address locality. If you have object-based | memory model, and the working set is of objects, not pages, | then address locality between objects doesn't matter. | | Of course, page-based based memory models are by FAR the | most common in practice. | | (Note: pages ARE objects, but the objects are significant | to the VM system and not to your program. So strictly, | page-based models are a corner case of object-based models, | where the objects are obscure.) | a-dub wrote: | would be interesting to see how the actual call stack is | implemented. they must either have a fixed width object | as you mention or some kind of linear chaining like | you're describing. | | found this on wikipedia: https://resources.sei.cmu.edu/as | set_files/TechnicalReport/19... | | memory and disk are unified into one address space, code | is represented by this "diana" structure which can be | compressed text, text, ast or machine code. would be | curious how procedures are represented in machine code. | | what a fascinating machine! | Someone wrote: | > is there an example of machine code that doesn't make use | of a linear contiguous call stack? | | Early CPUs didn't have support for a stack, and some early | languages such as COBOL and Fortran didn't need one. They | didn't allow recursive function calls, so return addresses | could be stored at fixed addresses, and a return could | either be an indirect jump reading from that address or a | direct jump whose target address got modified when writing | to that fixed address (see | https://people.cs.clemson.edu/~mark/subroutines.html for | the history of subroutine calls) | | Both go (https://blog.cloudflare.com/how-stacks-are- | handled-in-go) and rust | (https://mail.mozilla.org/pipermail/rust- | dev/2013-November/00...) initially had split stacks | (https://releases.llvm.org/3.0/docs/SegmentedStacks.html, | https://gcc.gnu.org/wiki/SplitStacks) | kazinator wrote: | > _Why do we even have linear physical and virtual addresses in | the first place, when pretty much everything today is object- | oriented?_ | | Simple: we don't want some low level kernel memory management | dictating what constitutes an "object". | | Everything isn't object-oriented. E.g. large arrays, memory- | mapped files, including executables and libraries. | | Linear memory sucks, but every other organization sucks more. | | Segmented has been done; the benefit-to-clunk ratio was | negligible. | MarkSweep wrote: | The benefit-to-thunk ratio was not great either. | | ( one reference to thunks involving segmented memory: | https://devblogs.microsoft.com/oldnewthing/20080207-00/?p=23... | ) | kazinator wrote: | Real segmentation would have solved the problem described in | the article. Under virtual memory segments like on the 80386 | (and mainframes before that), you can physically relocate a | segment and while adjusting its descriptor so that the | addressing doesn't change. | | The problem was mainly caused by having no MMU, so moving | around objects in order to save space required adjusting | pointers. Today, a copying garbage collector will do the same | thing; rewrite all the links among the moved objects. You'd | have similar hacks on Apple Macintoshes, with their MC68K | processors and flat space. | mwcremer wrote: | tl;dr page-based linear addressing induces performance loss with | complicated access policies, e.g. multilevel page tables. Mr. | Kamp would prefer an object model of memory access and | protection. Also, CHERI | (https://dl.acm.org/doi/10.5555/2665671.2665740) increases code | safety by treating pointers and integers as distinct types. | gumby wrote: | The Multics system was designed to have segments (for this | discussion == pages) that were handled the way he described, down | to the pointer handling. Not bad for the 1960s, though Unix was | designed for machines with a lot fewer transistors back at the | time when that mattered a lot. | | Things like TLBs (not a new invention, but going back to the | 1960s) really only matter to systems programmers, as he says, and | judicious use simplifies and has simplified programming for a | long time. I think if he really wants to go down this path he'll | discover that the worst case behavior (five probes to find a | page) really is worth it in the long run. | anewpersonality wrote: | CHERI is a gamechanger | gralx wrote: | Link didn't work for me. Direct link did: | | https://dl.acm.org/doi/abs/10.1145/3534854 | scottlamb wrote: | tl;dr: conventional design bad, me smart, capability-based | pointers (base+offset with provenance) can replace virtual | memory, CHERI good (a real modern implementation of capability- | based pointers). | | The first two points are similar to other Poul-Henning Kamp | articles [1]. The last two are more interesting. | | I'm inclined to agree with "CHERI good". Memory safety is a huge | problem. I'm a fan of improving it by software means (e.g. Rust) | but CHERI seems attractive at least for the huge corpus of | existing C/C++ software. The cost is doubling the size of | pointers, but I think it's worth it in many cases. | | I would have liked to see more explanation of how capability- | based pointers replacing virtual memory would actually work on a | modern system. | | * Would we give up fork() and other COW sorts of tricks? | Personally I'd be fine with that, but it's worth mentioning. | | * What about paging/swap/mmap (to compressed memory contents, | SSD/disk, the recently-discussed "transparent memory offload" | [2], etc)? That seems more problematic. Or would we do a more | intermediate thing like The Mill [3] where there's still a | virtual address space but only one rather than per-process | mappings? | | * What bookkeeping is needed, and how does it compare with the | status quo? My understanding with CHERI is that the hardware | verifies provenance [4]. The OS would still need to handle the | assignment. My best guess is the OS would maintain analogous data | structures to track assignment to processes (or maybe an extent- | based system rather than pages) but maybe the hardware wouldn't | need them? | | * How would performance compare? I'm not sure. On the one hand, | double pointer size => more memory, worse cache usage. On the | other hand, I've seen large systems spend >15% of their time | waiting on the TLB. Huge pages have taken a chunk out of that | already, so maybe the benefit isn't as much as it seemed a few | years ago. Still, if this nearly eliminates that time, that may | be significant, and it's something you can measure with e.g. | "perf"/"pmu-tools"/"toplev" on Linux. | | * etc | | [1] eyeroll at https://queue.acm.org/detail.cfm?id=1814327 | | [2] https://news.ycombinator.com/item?id=31814804 | | [3] http://millcomputing.com/wiki/Memory#Address_Translation | | [4] I haven't dug into _how_ when fetching pointers from RAM | rather than pure register operations, but for the moment I 'll | just assume it works, unless it's probabilistic? | throw34 wrote: | "The R1000 addresses 64 bits of address space instantly in every | single memory access. And before you tell me this is impossible: | The computer is in the next room, built with 74xx-TTL | (transistor-transistor logic) chips in the late 1980s. It worked | back then, and it still works today." | | That statement has to be coming with some hidden caveats. 64 bits | of address space is crazy huge so it's unlikely the entire range | was even present. If only a subset of the range was "instantly" | available, we have that now. Turn off main memory and run right | out of the L1 cache. Done. | | We need to keep in mind, the DRAM ICs themselves have a hierarchy | with latency trade-offs. | https://www.cse.iitk.ac.in/users/biswap/CS698Y/lectures/L15.... | | This does seem pretty neat though. "CHERI makes pointers a | different data type than integers in hardware and prevents | conversion between the two types." | | I'm definitely curious how the runtime loader works. | cmrdporcupine wrote: | _" We need to keep in mind, the DRAM ICs themselves have a | hierarchy with latency trade-offs_" Yes this is the thing -- | I'm not a hardware engineer or hardware architecture expert, | but -- it seems to me that what we have now is a set of | abstractions presented by the hardware to the software based on | a model of what hardware "used to" look like, mostly what it | used to look like in a 1970s minicomputer, when most of the | intensive key R&D in operating systems architecture was done. | | One can reasonably ask, like Mr Kamp is, why we should stick to | these architectural idols at this point in time. It's | reasonable enough, except that the alternative of heterodox, | alternative architectures is also heterogenous -- new concepts | that don't necessarily "play well with others." All our | compiler technology, all our OS conventions, our tooling, etc. | would need to be rethought under new abstractions. | | And those are fun hobby or thought exercises, but in the real | world of industry, they just won't happen. (Though I guess from | TFA it could happen in a more specialized domain like | aerospace/defence) | | In the meantime, hardware engineering is doing amazing things | building powerfully performing systems that give us some nice | convenient consistent (if sometimes insecure and awkward) myths | about how our systems work, and they're making them faster | every year. | bentcorner wrote: | Makes me wonder if 50 years from now we'll still be stuck | with the hardware equivalent of the floppy disk icon, only | because retooling the universe over from scratch is too | expensive. | nine_k wrote: | As they say, C was designed for the PDP-11 architecture, and | modern computers are forced to emulate it, because the tools | to describe software (languages and OSes) which we have can't | easily describe other architectures. | | There were modern semi-successful attempts though, see PS3 / | Cell architecture. It did not stick though. | | I'd say that the modern heterodox architecture domain is | GPUs, but we have one proprietary and successful interface | for them (CUDA), and the open alternatives (openCL) are | markedly weaker yet. And it's not even touching the OS | abstractions. | jart wrote: | You can avoid the five levels of indirection by using "unreal | mode". I just wish it were possible to do with 64-bit code. | cmrdporcupine wrote: | "The R1000 has many interesting aspects ... the data bus is 128 | bits wide: 64-bit for the data and 64-bit for data's type" | | _what what what?_ | | How on earth would you ever need to have a type enumeration 2^64 | long? | | Neat, though. | btilly wrote: | My guess is that it is an object oriented system. The data's | type is a pointer to the address that defines the type. Which | could be anywhere in the system. | | This is also a security feature. If you find a way to randomly | change the data's type, you're unlikely to successfully change | it to another type. | kimixa wrote: | The other option is to use those 64bits to double the total | bandwidth in the "Traditional" page-table system. | | All this extra complexity and bus width doesn't come for free, | after all, there's opportunity cost. | KerrAvon wrote: | No idea, but consider that it could be a enum + bitfield rather | than strictly an enum. | robotresearcher wrote: | I don't know if this machine supported it, but it could allow | you to have a system-wide unique type for this-struct-in-this- | thread-in-this-process, with strong type checking all the way | through the compiler into run time. Which would be pretty cool. | | GUIDs for types. | gpderetta wrote: | At Intel they probably still have nightmares about iAPX 432. They | are not going to try an OO architecture again. | | Having said that, I wouldn't be surprised if some form of | segmentation became popular again. | KerrAvon wrote: | I'd hope that anyone at Intel with said nightmares would have | read this paper by now (wherein Bob Colwell, et al, argue that | the 432 could have been faster with some minor fixes, and | competitive with contemporary CPUs with some additional larger | modifications). | | https://archive.org/details/432_complexity_paper/ | gumby wrote: | The underexplored value of early segmentation was the | discretionary segment level permissions enforced by hardware. | | Years ago I prototyped a system that had filesystem permission | support at the segment level. The idea was you could have a | secure dynamic library for, say, manipulating the passwd file | (you can tell how long ago that was). You could call into it if | you had the execute bit set appropriately, even if you didn't | have the read bit set, so you couldn't read the memory but | could call into it at the allowed locations (i.e. PLT was x | only). | | However it was clear everyone wanted to get rid of the segment | support, so that idea never went anywhere. | monocasa wrote: | They made a decent go at it again in 16 and 32 bit protected | mode. The GDT and LDT along with task gates were intended to be | used as an hardware object capability system like the iAPX | 432's. | kimixa wrote: | I'm a little confused about how the object base is looked up in | these systems, and if they're sparse or dense and have any size | or total object count limitations, and if that ends up having the | same limitations on total count as page tables that required the | current multi-level approach. | | As surely you could consider page table as effectively | implementing a fixed-size "object cache"? It is just a lookup for | an offset into physical memory, after all, with the "object ID" | just being the masked first part of the address? And if the | objects are variable sized, is it possible to end up with | physical address fragmentation as objects of different sizes are | allocated and freed? | | The claim of single-cycle lookups today would require an on-chip | fixed-size (and small!) fast sram, as there's a pretty hard limit | on the amount of memory you can get to read in a single clock | cycle, no matter how fancy or simple the logic behind deciding to | lookup. If we call this area the "TLB" haven't we got back to | pagetables again? | | And for the size of sram holding the TLB/object cache entries - | increasing the amount of data stored in them means you have less | total too. A current x86_64 CPU supports 2^48 of physical address | space, reduced to 36 bits if you know it's 4k aligned - and 2^57 | of virtual address space as the tag, again reduced to 45 bits if | we know it's 4k aligned. That means to store the tag and physical | address you need a total of 81 bits of SRRAM. A 64-bit object ID, | plus 64-bit physical address plus 64-bit size is 192bits, over 2x | that, so you could pack 2x the number of TLB entries into the | same sram block. To match the capabilities of the example above, | 57 bits of physical address (cannot be reduced as arbitrary sizes | means it's not aligned), plus a similarly reduced to 48 bit | object ID and size still adds up to 153, only slightly less than | 2x, though I'm sure people could argue that reducing the | capabilities here have merit, I don't know how many objects or | their maximum possible size in such a system. And that's "worst | case" 4k pages for the pagetable system too. | | I can't see how this idea could be implemented without extreme | limitations - look at the TLB size of modern processors and | that's the maximum number of objects you could have while meeting | the claims of speed and simplicity. There may be some advantage | in making them flexible in terms of size, rather than fixed-size, | but then you run into the same fragmentation issues, and need to | keep that size somewhere in the extremely-tight TLB memory. | monocasa wrote: | > As surely you could consider page table as effectively | implementing a fixed-size "object cache"? It is just a lookup | for an offset into physical memory, after all, with the "object | ID" just being the masked first part of the address? And if the | objects are variable sized, is it possible to end up with | physical address fragmentation as objects of different sizes | are allocated and freed? | | Because that's only a base, not a limit. The right pointer | arithmetic can spill over to any other object base's memory. | marshray wrote: | > with the "object ID" just being the masked first part of the | address? | | Doesn't that imply the minimum-sized object requires 4K | physical ram? | | Is that a problem? | kimixa wrote: | Maybe? If you just round up each "object" to 4k then you can | implement this using the current PTE on x86_64, but this | removes the (supposed) advantage of only requiring a single | PTE for each object (or "object cache" lookup entry or | whatever you want to call it) in the cases when an object | spans multiple page-sizes of data. | | Having arbitrary sizes objects will likely be possible in | hardware - it's just an extra size being stored in the PTE if | you can mask out the objectID from the address (in the | example in the original post, it's a whole 64-bit object ID, | allowing a full 64-bits of offset within each object, but | totaling a HUGE 128bit effectively address) | | But arbitrary sizes feels like it pushes the issues that many | current userspace allocators have to deal with today to the | hardware/microcode - namely about packing to cope with | fragmentation and similar (only instead of virtual address | space they'll have to deal with physical address space). The | solutions to this today are certainly non-trivial and still | can fail in many ways, so far away from being solved, let | along solved in a simple enough way to be implemented that | close to hardware. | avodonosov wrote: | Since this addressing scheme is <object, offset>, and as these | pairs need to fit in 64 bits, I am curious, is the numjer of bits | for each part is fixed and what are those fixed widths. In other | words what is the maximum possible offset within one object and | the max number of objects? | | Probably segment registers in x86 can be thought as object | identifiers, thus allowing the same non-linear approach?(Isn't | that the purpose of segments even?) | | Update: BTW, another term for what the author calls "linear" is | "flat". | monocasa wrote: | Yeah, x86 segments in the protected modes were intended to be | used as a hardware object capability system like the author is | getting at. | | And yeah, it's probably a fixed 64bit lookup into an object | descriptor table. | marshray wrote: | Wouldn't it be hilarious if the 21st century brought about | the re-adoption of the security design features introduced in | the 80286 (1982)? | monocasa wrote: | I came this close to ordering custom "Make the LDT Great | Again" hats after spectre was released, lol. | dragontamer wrote: | > Why do we even have linear physical and virtual addresses in | the first place, when pretty much everything today is object- | oriented? | | Well, GPU code is certainly not object-oriented, and I hope it | never becomes that. SIMD code won't be able to jump between | objects like typical CPU-oriented OOP does (unless all objects | within a warp/workgroup jump to the same function pointers?) | | GPU code is common in video games. DirectX needs to lay out its | memory very specifically as you write out the triangles and other | vertex/pixel data for the GPU to later process. This memory | layout is then memcopy'd over to PCIe using the linear address | space mechanism, and GPUs are now cohesive with this space | (thanks to Shared Virtual Memory). | | So today, thanks to shared virtual memory and advanced atomics, | we can have atomic compare-and-swap coordinate CPU and GPU code | operating over the same data (and copies of that data can be | cached in CPU-ram or GPU-VRAM and transferred over automatically | with PCIe memory barriers and whatnot). | | ---------- | | Similarly, shared linear address spaces operate over rDMA (remote | direct memory access), a protocol built on top of Ethernet. This | means that your linear memory space is mmap'd on your CPU, but | then asks for access to someone else's RAM over the network. The | mmap then causes this whole "inefficient pointer-traversals" to | then get turned into Ethernet packets to share RAM between CPUs. | | Ultimately, when you start dealing with high-speed data-sharing | between "external" compute units (ie: a GPU, or a ethernet- | connected far-away CPU), rather than "just" a NUMA-node or other | nearby CPU, the linear address space seems ideal. | | -------- | | Even the most basic laptop, or even Cell Phone, these days, is a | distributed system consisting of a CPU + GPU. Apple chips even | have a DSP and a few other elements. Passing data between all of | these things makes sense in a distributed linear address space | (albeit really wonky with PCIe, mmaps, base address pointers and | all sorts of complications... but they are figured out, and it | does work every day) | | I/O devices working directly in memory is going to only become | more common. 100Gbps network connections exist in supercomputer | labs, 10Gbps Ethernet is around the corner for consumers. NVMe | drives are pushing I/O to such high bandwidths that'd make DDR2 | RAM blush. GPUs are growing more complicated and are rumored to | start turning into distributed chiplets soon. USB3.0 and beyond | are high-speed links that directly drop off data into linear | address spaces (or so I've been told). Etc. etc. | edave64 wrote: | There is often a quite significant distance between the | beautiful, elegant and efficient design that brings tears to the | eyes of a designer, and being pragmatic and financially viable. | | Building a new competitive processor architecture isn't feasible | if you can't at least ensure compile-time compatibility with | existing programs. People won't buy a processor that won't run | their programs. | ajb wrote: | This article compares CHERI to an 80's computer, the Rational | R1000 (which I'm glad to know of). It's worth noting that CHERI's | main idea was explored in the 70's by the CAP computer[1]. CAP | and CHERI are both projects of the University of Cambridge's | Computer Lab. It's fairly clear that CAP inspired CHERI. | | [1] https://en.wikipedia.org/wiki/CAP_computer | yvdriess wrote: | Are you sure it wasn't done before by IBM in the '60s? | | That's usually the case. For hardware, at least | | For software, it usually was done before by Lisp in the '70s. | Animats wrote: | The original machines like that were the Burroughs 5000 | (1961), and the Burroughs 5500 (1964), which was quite | successful. Memory was allocated by the OS in variable length | chunks. Addresses were not plain numbers; they were more like | Unix paths, as in /program/function/variable/arrayindex. | | That model works, but is not compatible with C and UNIX. | heavenlyblue wrote: | How would you address recursive functions this way? | EvanAnderson wrote: | You beat me! CHERI totally made me think about those | machines. | | There's some good background here for those who are | interested: https://www.smecc.org/The%20Architecture%20%20o | f%20the%20Bur... | | The architecture of the B5000 / B5500 / B6500 lives on | today in the Unisys ClearPath line. I believe the OS, MCP, | is one of the longest-maintained software operating systems | still in active use, too. | monocasa wrote: | IBM didn't really play with hardware object capabilities | until the S/38, and even the it's a bit of a stretch to call | them that. | cmrdporcupine wrote: | Another system that had an object-based non-linear address space | I believe was the "Rekursiv" CPU developed at Linn (yes, the | Swedish audio/drum machine company; EDIT: Linn. Scottish. Not | drum machine. Thanks for the corrections. In fact I even knew | this at one time. Yay brain.) in the 80s. | | https://en.wikipedia.org/wiki/Rekursiv | | I actually have a copy of the book they wrote about it here | somewhere. I often fantasize about implementing a version of it | in FPGA someday. | Gordonjcp wrote: | > Linn (yes, the Swedish audio/drum machine company) in the 80s | | Uhm. | | Linn the audio company, known as Linn Products, are Scottish, | being based a little to the south of Glasgow, and named after | the park the original workshop was beside. | | Linn the drum machine company, known as Linn Electronics, were | American, being founded by and named after Roger Linn. | | Two totally different companies, run by totally different | people, not connected in any way, and neither of them Swedish. | | The Linn Rekursiv was designed by the audio company, and was | largely unsuccessful, and none exist any more - not even bits | of them :-/ | cmrdporcupine wrote: | oops :-) | kwhitefoot wrote: | Surely Linn is Scottish. | martincmartin wrote: | "Unsafe at Any Speed" is the name of Ralph Nader's book on car | manufacturers resisting car safety measures. It resulted in the | creation of the United States Department of Transportation in | 1966 and the predecessor agencies of the National Highway Traffic | Safety Administration in 1970. | akdor1154 wrote: | > They also made it a four-CPU system, with all CPUs operating in | the same 64-bit global address space. It also needed a good 1,000 | amperes at 5 volts delivered to the backplane through a dozen | welding cables. | | That is absolutely terrifying. | buildbot wrote: | These days you just use 12v and convert right next to or on die | - but we are still in that range of amps for big chips! Take | for example a 3090 at 500w @12v, the core is running at 1.056v, | that's 473 Amps! ___________________________________________________________________ (page generated 2022-06-29 23:00 UTC)