[HN Gopher] The road to Zettalinux ___________________________________________________________________ The road to Zettalinux Author : rwmj Score : 265 points Date : 2022-09-23 12:28 UTC (10 hours ago) (HTM) web link (lwn.net) (TXT) w3m dump (lwn.net) | t3estabc wrote: | wdutch wrote: | Hi GPT3, I didn't expect to see you here. | Majestic121 wrote: | Why are you posting GPT3 responses here? | LinkLink wrote: | For reference 2^64 = ~10^19.266 I don't think this is | unreasonable at all, its unlikely that computers will largely | stay the same in the coming years. I believe we'll see many | changes to how things like mass addressing of data and computing | resources is done. Right now our limitations in these regards are | addressed by distributed computing and databases, but in a hyper- | connected world there may come a time when such huge address | space could actually be used. | | It's an unlikely hypothetical but imagine if fiber ran | everywhere, and all computers seamlessly worked together sharing | computer power as needed. Even 256 bits wouldn't be out of the | question then. And before you say something like that will never | happen consider trying to convince somebody from 2009 that in 13 | years people would be buying internet money backed by nothing. | est31 wrote: | The extra bits might be used for different things like e.g. in | CHERI. The address space is still 64 bits, but there are 64 | bits in metadata added to it, so you get a 128 bit | architecture. | Tsarbomb wrote: | Not agreeing or disagreeing for the most part, but in 2009 all | of the prerequisite technology for cryptocurrency existed: | general purpose computers for the average person, accessible | internet, cryptographic algorithms and methods, and cheap | storage. | | For 256 bit computers, we need entirely new CPU architectures | and updated ISAs for not just x86/AMD64, but for other archs | increasing in popularity such as ARM and even RISC-V. Even then | compilers, build tools, and dependant devices with their | drivers need updates too. On top of all of this technical work, | you have the political work of getting people to agree on new | standards and methods. | mhh__ wrote: | 256 bits in the case of a worldwide mega-computer would be | such a huge departure from current architectures and more | importantly latency-numbers that we can barely even speculate | about it. | | It may be of note that hypothetically one can have a soft-ISA | 128 bit virtual address (a particularly virtual virtual | address) which is JITed down into a narrower physical address | by the operating system. This is as far as I'm aware how IBM | i works. | masklinn wrote: | I don't know it seems excessive to me. I could see the cold | storage maybe, with spanning storage pools (by my reckoning | there were 10TB drives in 2016 and the largest now are 20, so | 16 years from now should be 320 if it keeps doubling, which is | 5 orders of magnitude below still). | | > Right now our limitations in these regards are addressed by | distributed computing and databases, but in a hyper-connected | world there may come a time when such huge address space could | actually be used. | | Used at the core of the OS itself? How do you propose to beat | the speed of light exactly? | | Because you don't need a zettabyte-compatible kernel to run a | distributed database (or even file system, see ZFS), trying to | DMA things on the other side of the planet sounds like the | worst possible experience. | | Hell, our current computers right now are not even close to 64 | bit address spaces. The baseline is 48 bits, and x86 and ARM | are in the process of extending the address space (to 57 bits | for x86, and 52 for ARM). | alain94040 wrote: | Thanks to Moore's law, you can assume that DRAM capacity will | double every 1-3 years. Every time it doubles, you need one | more bit. So if we use 48 bits today, we have 16 bits left to | grow, which gives us at least 16 years of margin, and maybe | 48 years. (and it could be even longer if you believe that | Moore's law is going to keep slowing down). | Deukhoofd wrote: | > all computers seamlessly worked together sharing computer | power as needed. Even 256 bits wouldn't be out of the question | then. | | This sounds like it would be massively out of scope for Linux. | It'd require a complete overhaul of most of its core | functionality, and all of its syscalls. While not a completely | infeasible idea, it sounds to me like it'd require a completely | new designed kernel. | jmillikin wrote: | > It's an unlikely hypothetical but imagine if fiber ran | everywhere, > and all computers seamlessly worked | together sharing computer power > as needed. Even 256 | bits wouldn't be out of the question then. | | You could do this today with 196 bits (128-bit IPv6 address, | 64-bit local pointer). Take a look at RDMA, which could be | summarized as "every computer's RAM might be any computer's | RAM". | | The question is whether such an address makes sense for the | Linux kernel. If your hyper-converged distributed program wants | to call `read()`, does the pointer to the buffer really need to | be able to identify any machine in the world? Maybe it's enough | for the kernel to use 64-bit local pointers only, and have a | different address mechanism for remote storage. | krackout wrote: | I think the article is very shortsighted. By 2035-40 we'll | probably have memory only (RAM) computers massively available. No | disks means no current OS capable of handling these computers. A | change of paradigm needing new platforms and OSes. | | These future OSes may be 128bit, but I don't think the current | ones will make it to the transition. | alpb wrote: | There are plenty of OSes today capable of booting and running | from RAM. Pretty sure we wouldn't be burning all the prominent | OSes for something like that. | zasdffaa wrote: | Sounds nuts. Does anyone know how much power a 32GB DIMM draws? | How much would a fully populated 64-bit address space therefore | pull? | | Edit, if a 4GB (32-bits used) DRAMM pulls 1 watt, the rest of the | memory space is 32 bit = 4E9 so your memory is pulling ~4Gwatts | alone. That's not supportable, given the other electronics needed | to go around it. | Beltiras wrote: | I would think by now any bunch of clever people would be trying | to fix the generalized problem of supporting n-bit memory | addressing instead of continually solving the single problem of | "how do we go from n*2 to (n+1)*2". I guess it's more practical | to just let the next generation of kernel maintainers go through | all of this hullabaloo again in 2090. | tomcam wrote: | Just wanted to say I love this discussion. Have been pondering | the need for a 128-bit OS for decades but several of the issues | raised were completely novel to me. Fantastic to have so many | people so much smarter than I am hash it out informally here. | Feels like a master class. | znpy wrote: | I wonder if with 128-bit wide pointers it would make sense to | start using early-lisp-style tagged pointers. | Aqueous wrote: | If we're going to go for 128- why not just go for 256-? that way | we won't have to do this again for a while. | | or better yet, design a new abstraction for not having to hard- | code the limit of the pointer size but instead allow it to be | extensible as more addressable space becomes a reality, instead | of having to transition over and over. is this even possible? if | it is, shouldn't we head in that direction? | kmeisthax wrote: | The problem with variable-sized pointers is that... | | 1. Any abstraction you could make will have worse performance | than a fixed-size machine pointer | | 2. In order to support _any kind_ of variably-sized type you | need machine pointers to begin with, and those will always be | fixed-size because variable size is even harder to support in | hardware than native code | | And furthermore going straight to 256 has its own problems. | Each time you double the pointer size you also significantly | increase the size of structures with a lot of pointers. V8 | notably uses "pointer compression" - i.e. using 32-bit offsets | instead of 64-bit pointers, because it never needs >4GB of | JavaScript objects at once and JS objects are _very_ pointer- | ridden. | | There's two forces at play here: pointers need to be small | enough to embed in any data structure and large enough to | address the entire working set of the program. Larger pointers | are not inherently better[0], and neither are smaller pointers. | It's a balancing act. | | [0] ASLR, PAC, and CHERI are exceptions, as mentioned in the | original article. | tayistay wrote: | Is 128 bit the limit of what we would need? We use 128 bit UUIDs. | 2^256 seems to be more than the number of atoms on Earth. | wongarsu wrote: | The article does talk about just making the pointer type used | in syscalls 256 bit wide to "give room for any surprising | future needs". | | The size of large networked disk arrays will grow beyond 64 bit | addresses, but I don't think we will exceed 2^128 bits of | storage of any size, for any practical application. Then again, | there's probably people who thought the same about 32 bit | addresses when we moved from 16bit to 32bit addresses. | | The most likely case for "giant" pointers (more than 128 bits) | will be adding more metadata into the pointer. With time we | might find enough use cases that are worth it to go to 256bit | pointers, with 96bit address and 160 bit metadata or something | like that. | dylan604 wrote: | >Then again, there's probably people who thought the same | about 32 bit addresses when we moved from 16bit to 32bit | addresses. | | There's a fun "quote" about 384k being all anyone would ever | need, so clearly everyone just needs to settle down and | figure out how to refactor their code. | shadowofneptune wrote: | The IBM PC's 20-bit addressing was 16 times the size of | 16-bit addresses. From 20-bit to 32-bit, 4096 times larger. | 32 to 64 is 4,294,967,296 times larger (!). The scale alone | makes using all this space unlikely on a PC. | jupp0r wrote: | "The problem now is that there is no 64-bit type in the mix. One | solution might be to "ask the compiler folks" to provide a | __int64_t type. But a better solution might just be to switch to | Rust types, where i32 is a 32-bit, signed integer, while u128 | would be unsigned and 128 bits. This convention is close to what | the kernel uses already internally, though a switch from "s" to | "i" for signed types would be necessary. Rust has all the types | we need, he said, it would be best to just switch to them." | | Does anybody know why they don't use the existing fixed size | integer types [1] from C99 ie uint64_t etc and define a 128 bit | wide type on top of that (which will also be there in C23 IIRC)? | | My own kernel dev experience is pretty rusty at this point (pun | intended), but in the last decade of writing cross platform | (desktop, mobile) userland C++ code I advocated exclusively for | using fixed width types (std::uint32_t etc) as well as constants | (UINT32_MAX etc). | ghoward wrote: | It would be sad if we, as an industry, do not take this | opportunity to create a better OS. | | First, we should decide whether to have a microkernel or a | monolithic kernel. | | I think the answer is obvious: microkernel. This is much safer, | and seL4 has shown that performance need not suffer too much. | | Next, we should start by acknowledging the chicken-and-egg | problem, especially with drivers. We will need drivers. | | So let's reuse Linux drivers by implementing a library for them | to run in userspace. This should be difficult, but not | impossible, and the rewards would be _massive_ , basically | deleting the chicken-and-egg problem for drivers. | | To solve the userspace chicken-and-egg problem (having | applications that run on the OS), implement a POSIX API on top of | the OS. Yes, this will mean that some bad legacy like `fork()` | will exist, but it will solve that chicken-and-egg problem. | | From there, it's a simple matter of deciding what the best design | is. | | I believe it would be three things: | | 1. Acknowledging hardware as in [1]. | | 2. A copy-on-write filesystem with a transactional API (maybe a | modified ZFS or BtrFS). | | 3. A uniform event API like Windows' handles and Wait() functions | or Plan 9's file descriptors. | | For number 3, note that not everything has to be a file, but | receiving events like signals and events from child processes | should be waitable, like in Windows or Linux's signalfd and | pidfd. | | For number 2, this would make programming _so_ much easier on | everybody, including kernel and filesystem devs. And I may be | wrong, but it seems like it would not be hard to implement. When | doing copy-on-write, just copy as usual, and update the root | B-tree node; the transaction commits when the root B-tree node is | flushed to disk, and the flush succeeds. | | (Of course, this would also require disks that don't lie, but | that's another problem.) | | [1]: https://www.usenix.org/conference/osdi21/presentation/fri- | ke... | gwbas1c wrote: | > Matthew Wilcox took the stage to make the point that 64 bits | may turn out to be too few -- and sooner than we think | | Let's think critically for a moment. I grew up in the 1980s and | 1990s, when we all craved more and more powerful computers. I | even remember the years when each generation of video games was | marketed as 8-bit, 16-bit, 32-bit, ect. | | BUT: We're hitting a point where, for what we use computers for, | they're powerful enough. I don't think I'll ever need to carry a | 128-bit phone in my pocket, nor do I think I'll need a 128-bit | web browser, nor do I think I'll need a 128-bit web server. (See | other posts about how 64-bits can address massive amounts of | memory.) | | Will we need 128-bit computing? I'm sure _someone_ will find a | need. But let 's not assume they'll need an operating system | designed in the 1990s for use cases that we can't imagine today. | mikepurvis wrote: | IMO there's an important distinction to be made between high- | bit _addressing_ and high-bit _computing_. | | Like, no one has enough memory to need more than 64 bits for | addressing, and that is likely to remain the case for the | foreseeable future. However, 128- and 256-bit values are | commonly used in domains like graphics, audio, and so-on, where | you need to apply long chains of transformations and filters, | but retain as much of the underlying dynamic range as possible. | PaulDavisThe1st wrote: | Those are data types, not pointer values or filesystem | offsets. Totally different thing. | mikepurvis wrote: | Is that strictly the definition, that the bit-width of a | processor refers only to its native pointer size? | | I know it's hardly a typical or modern example, but the N64 | had just 4MB of memory (8MB with the expansion pack). It | most certainly didn't need 64-bit pointers to address that | pittance, so it was a "64 bit processor" largely for the | purposes of register/data size. | PaulDavisThe1st wrote: | Not so much a definition as where the problem lies. | | If the thing that can refer to a memory address changes | size, there are very different problems than will arise | if the size of "an integer" changes. | | You could easily imagine a processor that can only | address an N-bit address space, but can trivially do | arithmetic on N*M bit integers or floating point values. | And obviously the other way around, too. | | In general, I think "N bit processor" tends to refer to | the data type sizing, but since those primitive data | types will tend to fit into the same registers that are | used to hold pointers, it ends up describing addressing | too. | uluyol wrote: | Supercomputers? Rack-scale computing? See some of the work | being done with RDMA and "far memory". | gwbas1c wrote: | Yes... And do you need that in the same kernel that goes into | your phone and your web server? | jerf wrote: | I think the argument for 128-bits being necessary strictly for | routing memory in a single "computer" is fairly weak. We're | already seeing the plateauing of memory sizes nowadays; what | was exponentially in reach within just a couple of decades | exponentially recedes from us as our progress goes from | exponential to polynomial. | | But the argument that we need more than 64 bit capability for a | lot of other reasons _in conjunction_ with memory | addressability is I think very strong. A lot of very powerful | and safe techniques become available if we can tag pointers | with more than a bit squeezed out here and a bit squeezed out | there. I could even see hard-coding the CPU to, say, look at 80 | bits as an address and then us the remaining 48 for tagging of | various sorts. There 's precedent, and 80 bits is an awful lot | of addressible memory; that's a septillion+ addressible bytes, | by the time we need more than that, if we do, our future selves | can deal with that. (It is good to look ahead a decade or two | and make reasonable preparations, as this article does; it is | hubris to start trying to look ahead 50 years or a century.) | wongarsu wrote: | We are talking about the OS kernel that supports 4096 CPU | cores. The companies who pay their engineers to do linux kernel | development tend to be the same ones that have absurd needs. | p_l wrote: | And that's not enough to boot on some platforms that linux | runs on (even with amd64 ISA) unless one partitions the | computer into smaller cpu counts | teddyh wrote: | Maybe we can finally fix maxint_t to be the largest integer type | again. | tonnydourado wrote: | The "In my shoes?" bit was hilarious | PaulHoule wrote: | On one hand The IBM System/38 used 128 bit pointers in the 1970s, | despite having a 48 bit physical address bus. These were used to | manage persistent objects on disk or network with unique ids a | lot like uuids. | | On the other hand, filling out a 64 bit address space looks | tough. I struggled to find something of the same magnitude of | 2^64 and I got 'number of iron atoms in an iron filing', From a | nanotechnological point of view a memory bank that size is | feasible (fits in a rack at 10,000 atoms per bit) but progress in | semiconductors is slowing down. Features are still getting | smaller but they aren't getting cheaper anymore. | [deleted] | protomyth wrote: | The AS/400 and iSeries also use 128-bit pointers. 128-bit would | be useful for multiple pointers already in common use such as | ZFS and IP6 addresses. I expect it will the last hop for a long | time. | PaulHoule wrote: | Those are evolved from the System/38. | protomyth wrote: | Yeah, IBM is one company that shows how to push the models | down the road. They do take their legacy seriously. | EvanAnderson wrote: | In the context of the AS/400's single-level store | architecture the 128-bit pointers make a lot of sense, too. | bdn_ wrote: | I thought up a few ways to visualize 2^64 unique items: | | - You could give every ant on Earth ~920 unique IDs without any | collisions | | - You could give unique IDs for every brain neuron for all ~215 | million people in Brazil | | - The ocean contains about 20 x (2^64) gallons of water (3.5267 | x 10^20 gallons total) | | - There are between 100-400 billion stars in the Milky Way, so | you could assign each star between 46,000,000-184,000,000 | unique IDs each | | - You could assign ~2.5 unique IDs to each grain of sand on | Earth | | - If every cell of your body contained a city with 500,000 | people each, every "citizen" of your body could have a unique | ID without any collisions | | Calculating these figures is actually a lot of fun! | tonnydourado wrote: | Those are great examples | Eduard wrote: | There are only ~368 grains of sand per ant? | fragmede wrote: | Well no. See there's this one ant, Jeff, that's hogging | them all for itself, so each ant only gets 50 grains grains | of sand. | ManuelKiessling wrote: | No wonder I keep reading about sand shortages. | lloeki wrote: | > filling out a 64 bit address space looks tough. I struggled | to find something of the same magnitude of 2^64 and I got | 'number of iron atoms in an iron filing' | | Reminded me of Jeff Bonwick's answer to the following question | about his 'boiling the oceans' quip related to ZFS being a "128 | bit filesystem": | | > 64 bits would have been plenty ... but then you can't talk | out of your ass about boiling oceans then, can you? | | Sadly his Sun hosted blog was eaten by the migration to Oracle, | so thanks to the Internet Archive again: | | http://web.archive.org/web/20061111054630/http://blogs.sun.c... | | That one dives into some of the "handwaving" a bit: | | https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans/ | | And that one goes into how much energy it would take to _merely | spin enough disks up_ : | | https://www.reddit.com/r/DataHoarder/comments/71p8x4/reachin... | PaulHoule wrote: | One absolute limit of computation is that it takes | (1/2) kT | | of energy to delete one bit of information where k is the | Boltzmann constant and T is the temperature. Let T = 300deg K | (room temperature) | | I multiplied that by 2128, and got 1.41x1018 J of energy. 1 | ton of TNT is 4.2x1012 J, so that is a 335 kiloton explosion | worth of energy just to boot. | | That's not impossible, that much heat is extracted from a | nuclear reactor in a few months. If you want to go faster you | need a bigger system, but a bigger system will be slower | because of light speed latency. | | (You do better, however, at a lower temperature, say 1deg K | but heat extraction gets more difficult at lower temperatures | and you spend energy on refrigeration unless you wait long | enough for the Universe to grow colder.) | tuatoru wrote: | > 1.41x1018 J of energy. | | Used over the course of a year, that is a constant 44.4 GW. | Less than Bitcoin uses already | turtletontine wrote: | In fairness, the _need_ for 128-bit addressable systems | will be when 64 address bits is not enough. That will be | long before people are using 2^128 bytes on one system. So | doing the calculation with 2^65 bytes would be a more even | handed estimate of the machine that would require this | tuatoru wrote: | > On one hand The IBM System/38 used 128 bit pointers in the | 1970s, despite having a 48 bit physical address bus. | | And the original processor was 24 bits, then it was upgraded to | 36 bits (not a typo: 36 bits), and then to POWER 64 bits. | | (When that last happened, it was re-badged AS/400. Later, | marketing renamed the AS/400 to iSeries, and then to IBM i, | without changing anything significant. Still uses Power CPUs, | AFAIK). | | For users, upgrades were a slightly longer than usual backup | and restore. | | What's the hard part here? | mhh__ wrote: | IBM i is still in development and also has a 128 bit pointer. | wongarsu wrote: | What's the average lifespan of a line of kernel code? I imagine | by starting this project 12 years before its anticipated use case | they can get very far just by requiring that any new code is | 128-bit compatible (in addition to doing the broader | infrastructure changes needed like fixing the syscall ABI) | rwmj wrote: | _> What 's the average lifespan of a line of kernel code?_ | | There's a fun tool called "Git of Theseus" which can answer | this question! You can see some graphs of Linux code on the web | page: https://github.com/erikbern/git-of-theseus | | Named after the Ship of Theseus: | https://en.wikipedia.org/wiki/Ship_of_Theseus | forgotpwd16 wrote: | There're some more in the presentation article: | https://erikbern.com/2016/12/05/the-half-life-of- | code.html#:... | | A (Linux) kernel line has half-life 6.6 years. The highest of | the projects analyzed. The lowest went to Angular with half- | life 0.32 years. | trasz wrote: | Not sure about those 12 years - 128-bit registers are already | there, and CHERI Morello prototype is at a "physical silicon | using this functionality under CheriBSD" stage. | jmillikin wrote: | The section about 128-bit pointers being necessary for expanded | memory sizes is unconvincing -- 64 bits provides 16 EiB (16 x | 1024 x 1024 x 1024 x 1 GiB), which is the sort of address space | you might need for byte-level addressing of a warehouse full of | high-density HDDs. Memory sizes don't grow like they used to, and | it's difficult to imagine what kind of new physics would let | someone fit that many bytes into a machine that's practical to | control with a single Linux kernel instance. | | CHERI is a much more interesting case, because it expands the | definition of what a "pointer" is. Most low-level programmers | think of pointers as _just_ an address, but CHERI turns it into a | sort of tuple of (address, bounds, permissions) -- every pointer | is bounds-checked. The CHERI folks did some cleverness to pack | that all into 128 bits, and I believe their demo platform uses | 128-bit registers. | | The article also touches on the UNIX-y assumption that `long` is | pointer-sized. This is well known (and well hated) by anyone that | has to port software from UNIX to Windows, where `long` and `int` | are the same size, and `long long` is pointer-sized. I'm firmly | in the camp of using fixed-size integers but the Linux kernel | uses `long` all over the place, and unless they plan to do a mass | migration to `intptr_t` it's difficult to imagine a solution that | would let the same C code support 32-, 64-, and 128-bit | platforms. | | (comedy option: 32-bit int, 128-bit long, and 64-bit `unsigned | middle`) | | The article also mentions Rust types as helpful, but Rust has its | own problems with big pointers because they inadvisably merged | `size_t`, `ptrdiff_t`, and `intptr_t` into the same type. They're | working on adding equivalent symbols to the FFI module[0], but | untangling `usize` might not be possible at this point. | | [0] https://github.com/rust-lang/rust/issues/88345 | travisgriggs wrote: | > (comedy option: 32-bit int, 128-bit long, and 64-bit | `unsigned middle`) | | Or rather than keep moving the long goalpost, keep long at | u64/i64 and add prolong(ed) for 128. Or we could keep long as | the "nominal" register value, and introduce "short long" for | 64. So many options. | dragontamer wrote: | > it's difficult to imagine what kind of new physics would let | someone fit that many bytes into a machine that's practical to | control with a single Linux kernel instance. | | I nominally agree with most of your post. But I should note | that modern systems seem to be moving towards a "one pointer | space" for the entire cluster. For example, 8 GPUs + 2 CPUs | would share the same virtual memory space (GPU#1 may take one | slice, GPU#2 takes another, etc. etc.). | | This allows for RDMA (ie: mmap across Ethernet and other | networking technologies). If everyone has the same address | space, then you can share pointers / graphs between nodes and | the underlying routing/ethernet software will be passing the | data automatically between all systems. Its actually quite | convenient. | | I don't know how the supercomputer software works, but I can | imagine that 4000 CPUs + 16000 GPUs all sharing the same 64-bit | address space. | _the_inflator wrote: | I agree with you. | | And seeing datacenter after datacenter shooting up like | mushrooms, there might be some sort of abstraction running in | this direction, that makes 128bit addresses feasible. At the | moment 64bit seems like paging in this sense. | shadowofneptune wrote: | Even so, existing ISAs could address more memory using | segmentation. AMD64 has a variant of long mode where the | segment registers are re-enabled. For the special programs | that need such a large space, far pointers wouldn't be that | complicating. | rwmj wrote: | Distributed Shared Memory is a thing, but I'm not sure how | widely it is used. I found that it gives you all the | coordination problems of threads in symmetric multiprocessing | but at a larger scale and with much slower synchronisation. | | https://en.wikipedia.org/wiki/Distributed_shared_memory | dragontamer wrote: | https://en.wikipedia.org/wiki/Remote_direct_memory_access | | Again, I'm not a supercomputer programmer. But the | whitepapers often discuss RDMA. | | From my imagination, it sounds like any other "mmap". You, | the programmer, just remembers that the mmap'd region is | slower (since it is read/write to a Disk, rather than to | RAM). Otherwise, you treat it "like RAM" from a programming | perspective entirely for convenience sake. | | As long as you know my_mmap_region->next = foobar(); is a | slow I/O operation pretending to be memory, you're fine. | | --------- | | Modern systems are converging upon this "single address | space" programming model. PCIe 3.0 implements atomic | operations and memory barriers, CXL is going to add cache- | coherence over a remote / I/O interface. This means that | all your memory_barriers / atomics / synchronization can be | atomic-operations, and the OS will automatically translate | these memory commands into the proper I/O level | atomics/barriers to ensure proper synchronization. | | This is all very new, only within the past few years. But I | think its one of the most exciting things about modern | computer design. | | Yes, its slow. But its consistent and accurately modeled by | all elements in the chain. Atomic-compare-and-swap over | RDMA can allow for cache-coherent communications and | synchronization over Ethernet, over GPUs, over CPUs, and | any other accelerators sharing the same 64-bit memory | space. Maybe not quite today, but soon. | | This technology already exists for PCIe 3.0 CPU+GPUs | synchronization from 8 years ago (Shared Virtual Memory). | Its exciting to see it extend out into more I/O devices. | FuriouslyAdrift wrote: | RDMA is used heavily in SMB3 file systems for Microsoft | HyperV failover clusters. | c-linkage wrote: | Distributed Memory Access is just another kind of Non- | Uniform Memory Access, which is Yet Another Leaky | Abstraction. Specifically, if you care about performance | _at all_ you now have to worry about _where in RAM your | data lives_. | | Caring about where in memory your data lives is different | from dealing with cache or paging. Programmers have to | plan ahead to keep frequently accessed data in fast RAM, | and infrequently accessed data in "slow" RAM. You'll | probably need special APIs to allocate and manage memory | in the different pools, not unlike the Address Windowing | Extensions API in Microsoft Windows. | | And once you extend "memory" outside the chassis, you'll | have to design your application with the expectation that | _any memory access could fail_ because a network failure | means the memory is no longer accessible. | | If you only plan to deploy in a data center then _maybe_ | you can ignore pointer faults, but that is still a risk, | especially if you decide to deploy something like Chaos | Monkey to test your fault tolerance. | zozbot234 wrote: | > And once you extend "memory" outside the chassis, | you'll have to design your application with the | expectation that any memory access could fail because a | network failure means the memory is no longer accessible. | | You have to deal with these things anyway in any kind of | distributed setting. What this kind of location- | independence via SSI really buys you is the ability to | scale the exact same workloads _down_ to a single cluster | or even a single node when feasible, while keeping an | efficient shared-memory programming model instead of | doing slow explicit message passing. It seems like a | pretty big simplification. | jandrewrogers wrote: | I've written code for a few different large-scale SSI | architectures. The shared-memory programming model is | _less_ efficient than explicit message passing in | practice because it is much more difficult to optimize. | The underlying infrastructure is essentially converged at | this point, so performance mostly comes down to the | usability of the programming model. | | The marketing for SSI was that it was simple because | programmers would not have to learn explicit message | passing. Unfortunately, people that buy supercomputers | tend to care about performance, so designing a | supercomputer that is difficult to optimize misses the | point. In real code, the only way to make them perform | well was to layer topology-aware message passing on top | of the shared memory model. At which point you should've | just bought a message passing architecture. | | There is only one type of large-scale SSI architecture | that is able to somewhat maintain the illusion of uniform | shared memory -- hardware latency-hiding e.g. barrel | processors. If programmers have difficulty writing | scalable code with message passing, then they | _definitely_ are going to struggle with this. These | systems use a completely foreign programming paradigm | that looks deceptively like vanilla C++. Exceptional | efficient, and companies design new ones every few years, | but without programmers that grok how to write optimal | code they aren 't much use. | sakras wrote: | At least at this latest SIGMOD, it felt like everyone and | their dog was researching databases in an RDMA | environment... so I'd imagine this stuff hasn't peaked in | popularity. | throw10920 wrote: | That's what I was going to chime in with - you pay for that | extra address width. Binary addition and multiplication | latency is super-linear with regards to operand width. | Larger pointers lead to more memory use, and memory access | latency is non-constant with respect to size. | | It _might_ make sense for large distributed systems to move | to a 128-bit architecture, but I don 't see any reason for | consumer devices, at least with current technology. | dragontamer wrote: | > Binary addition ... is super-linear with regards to | operand width | | No its not. That's why Kogge-Stone's carry lookahead | adder was such an amazing result. O(log(n)) latency with | respect to operand width with O(n) total half-adders | used. | | It may seem like its super-linear. But the power of | prefix-sums leads to a spectacular and elegant solution. | Kogge-stone (and the concept of prefix-sums) is one of | the most important parallel-programming / parallel-system | results of the last 50 years. Dare I say it, its _THE_ | most important parallel programming concept. | | > multiplication latency | | You could just... not implement 128-bit multiplication. | Just support 128-bit pointers (aka: addition) and leave | multiplication for 64-bits and below. | throw10920 wrote: | You're right, binary addition isn't super-linear. It is | non-constant, though, which is a slightly surprising | result if you don't know much about hardware. | dragontamer wrote: | Kogge-stone's O(Log2(n)) latency complexity might as well | be constant. The difference between 64-bit and 128-bit is | the difference between 6 and 7. | | There's going to be no issues implementing a 128-bit | adder. None at all. | pclmulqdq wrote: | Multiplication of 128 bit numbers is also not a big | issue. Today you can do it with 4 MULs and some addition, | and it would still be faster than 64-bit division. | Hardware multiplication of 128-bit numbers would have | area problems more than speed problems. You could always | throw out the top 128 bits of the result (since 128x128 | has a 256 bit result), and the circuit wouldn't be much | bigger than a full 64 bit multiplier. | meisanother wrote: | There's also something beautiful about seeing or creating | a Kogge-Stone implementation on silicon. | | I know it was one of the first time I thought to myself: | this is not just a straightforward pipeline, yet it all | follows such a beautifully geometrical interconnect | pattern. Super fast, yet very elegant to layout. | dragontamer wrote: | The original paper is a masterpiece to read as well, if | you haven't read it. | | "A Parallel Algorithm for the Efficient Solution of a | General Class of Recurrence Equation", by Kogge and | Stone. | | It proves the result for _all_ associative operations | (technically, a class slightly larger than associative. | Kogge and Stone called this a "semi-associative" | operation). | meisanother wrote: | Well, just got it. Thanks for the reference! | | A bit sad that 1974 papers are still behind a IEEE | paywall... | | Edit: Just finished reading it. I have to say that the | generalization of 3.2 got a bit over me, but otherwise | it's pretty amazing that they could define such a | generalization. Intuition for those type of problem is | often to proceed one step at a time, N times. | | That it is provably doable in log2(N) is great, | especially since it allows for a choice of the | depth/number of processors you want to use for the | problem. Hopefully next time I design a latency- | constrained system I remember to look at that article | dragontamer wrote: | > Hopefully next time I design a latency-constrained | system I remember to look at that article | | Nah. Your next step is to read "Data parallel algorithms" | by Hillis and Steele, which starts to show how these | principles can be applied to code. (Much higher-level, | easier to follow, paper. From ACM too, so its free since | its older than 2000) | | Then you realize that all you're doing is following the | steps towards "Map-reduce" and modern parallel code and | just use Map Reduce / NVidia cub::scan / etc. etc. and | all the modern stuff that is built from these fundamental | concepts. | | Kogge and Stone's paper sits at the root of it all | though. | cmrdporcupine wrote: | It seems to me that such a memory space could be physically | mapped quite large while still presenting 64-bit virtual | memory addresses to the local node? How likely is it that any | given node would be mapping out more than 2^64 bytes worth of | virtual pages? | | The VM system could quite simply track the physical addresses | as a pair of `u64_t`s or whatever, and present those pages as | 64-bit pointers. | | It seems in particular you might want to have this anyways, | because the actual costs for dealing with such external | memories would have to be much higher than local memory. | Optimizing access would likely involve complicated cache | hierarchies. | | I mean, it'd be exciting if we had need for memory space | larger than 2^64 but I just find it implausible with current | physics and programs? But I'm also getting old. | rektide wrote: | Leaving cluster coherent address space behind - like you | say - is doable. But you lose what the parent was saying: | | > _If everyone has the same address space, then you can | share pointers / graphs between nodes and the underlying | routing/ethernet software will be passing the data | automatically between all systems. Its actually quite | convenient._ | vlovich123 wrote: | Let's say you have nodes that have 10 TiB of RAM in them. | You then need 1.6M nodes (not CPUs, but actual boxes) to | use up 64bits of address space. It seems like the | motivation is to continue to enable Top500 machines to | scale. This wouldn't be coming to a commercial cloud | offering for a long time. | rektide wrote: | Why limit yourself to in-memory storage? I'd definitely | assume we have all our storage content memory mapped onto | our cluster too, in this world. People have been building | exabyte (1M gigabytes) scale datacenters since well | before 2010, and 16 exabytes, the current Linux limit | according to the most upvoted post here, isn't that much | more inconceivable. | | Having more space available usually opens up more | interesting possibilities. I'm going to rattle off some | assorted options. If there's multiple paths to a given | bit of data, we could use different addresses to refer to | different paths. We could do something like ILA in IPv6, | using some of the address as a location identifier: | having enough bits for both the location and the identity | parts of the address without being too constrained would | be helpful. We could use the extra pointer bits for | tagged memory or something like CHERI, which allow all | kinds of access-control or permission or security | capabilities. Perhaps we create something like id's | MegaTexture, where we can procedurally generate data on | the fly if given an address. There's five options for why | you'd want more address space than addressable storage. | And I think some folks are already going to be quite | limited & have quite a lot of difficulty partitioning up | their address space, if they only have for example 1.6m | buckets of 1TB (one possible partitioning scheme). | | The idea of being able to refer to everything anywhere | that does or did exist across a very large space sure | seems compelling & interesting to me! | vlovich123 wrote: | Maybe. You are paying a significant performance penalty | for ALL compute to provide that abstraction though. | Aperocky wrote: | Sounds like a disaster in terms of potential bugs. | jerf wrote: | It is, but it's the same disaster of bugs we already have | from multiple independent cores sharing the same memory | space, not a brand new disaster or anything. | | It's a disaster of latency issues too, but it's not like | that's surprising anyone either, and we already have NUMA | on some multi-core systems which is the same problem. | | We have existing tools that can be extended in | straightforward ways to deal with these issues. And it's | not like there's a silver bullet here; having separate | address spaces everywhere comes with its own disaster of | issues. Pick your poison. | sidewndr46 wrote: | Instead of having another thread improperly manipulating | your pointers and scribbling all over memory, now you can | have an entire cluster of distributed machines doing it. | This is a clear step forward. | rektide wrote: | Not that the industry doesnt broadly deserve this FUD | take/takedown, but perhaps possibly maybe it might end up | being really good & useful & clean & clear & lead to very | high functioning very performant very highly observable | systems, for some. | | Having a single system image has many potential upsides, | and understandability & reasonability are high among | them. | yencabulator wrote: | Just because you can refer to the identity of a thing | anywhere in the cluster doesn't mean it can't also be | memory-safe, capability-based, and just an RPC. | maxwell86 wrote: | > How likely is it that any given node would be mapping out | more than 2^64 bytes worth of virtual pages? | | In the Grace Hopper whitepaper, NVIDIA says that they | connect multiple nodes with a fabric that allows them to | creat a virtual address space across all of them. | dylan604 wrote: | >(comedy option: 32-bit int, 128-bit long, and 64-bit `unsigned | middle`) | | rather than unsigned middle, could we just call it malcom? | cmrdporcupine wrote: | Thanks for pointing out the `usize` ambiguity. It drives me | nuts. I suspect it would make me even crazier if I was doing | embedded with Rust right now and had to work with unsafe | hardware pointers. | | (They also fail to distinguish an equivalent of `off_t` out, | too. Not that I think that would have the same bit width | ambiguities. But it seems odd to refer to offsets by a 'size') | api wrote: | I can't imagine a single Linux kernel instance or single | program controlling that much bus-local RAM, but as you say | there are other uses. | | One use I can imagine is massively distributed computing where | pointers can refer to things that are either local or remote. | These could even map onto IPv6 addresses where the least | significant 64 bits are a local machine pointer and the most | significant 64 bits are the machine's /64. Of course the | security aspect would have to be handled at the transport layer | or this would have to be done on a private network. The latter | would be more common since this would probably be a | supercomputer thing. | | Still... I wonder if this needs CPU support or just compiler | support? Would you get that much more performance from having | this in hardware? | | I do really like how Rust has u128 native in the language. This | permits a lot of nice things including efficient implementation | of some cryptography and math stuff. C has irregular support | for uint128_t but it's not really a first class citizen. | gjvc wrote: | _I can 't imagine a single Linux kernel instance or single | program controlling that much bus-local RAM, but as you say | there are other uses._ | | MS-DOS and 640K ... | api wrote: | The addressable size grows exponentially with more bits, | not linearly. 2^64 is not twice as big as 2^32. It's more | than four billion times as big. 2^32 was only 65536 times | as big as 2^16. | | Going past 2^64 bytes of local high speed RAM becomes a | physics problem. I won't say never but it would not just be | an evolutionary change from what we have and a processor | that could perform useful computations on that much data | would be equally nuts. Just moving that much data on a bus | of today would take too long to be useful, let alone | computing on it. | retrac wrote: | > 64 bits provides 16 EiB (16 x 1024 x 1024 x 1024 x 1 GiB), | which is the sort of address space you might need for byte- | level addressing of a warehouse full of high-density HDDs. | Memory sizes don't grow like they used to | | An exabyte was an absolutely incomprehensible amount of memory, | once. Nearly as incomprehensible as 4 gigabytes seemed, at one | time. But as you note, 64 bits of addressable data can fit into | a single warehouse now. | | Going by the historical rate of increase, $100 would buy about | a petabyte of storage in 2040. Even presuming a major slowdown, | we still start running into 64 bit addressing as a practical | limit, perhaps sooner than you think. | Spooky23 wrote: | Storage and interconnect specs are getting a lot faster. I | could see a world where you treated an S3 scale storage system | as a giant tiered addressable memory space. AS/400 systems sort | of did something like that at a small scale. | eterevsky wrote: | 8 EiB of data (in case we want addresses using signed integers) | is around 20 metric tons of micro-SD cards one TB each | (assuming they weigh 2g each). This could probably fit in a | single shipping container. | hnuser123456 wrote: | shipping container = 40x8x8ft | | microsd card = 15x11x1mm, 0.5g | | fits 437,503,976 cards = 379 EiB, costs $43.7B | | 219 metric tons | | 8 EiB ~ 10,000,000 TB = fills the shipping container 2.2% | high or 56mm or 2 inches, 5 metric tons, costs $1B | | shipping containers are rated for up to 24 metric tons, so | ~40 EiB $5B 10 inches of cards etc | bombcar wrote: | Don't underestimate the bandwidth of a shipping container | filled with SD cards. | MR4D wrote: | But the cooling.... | | Seriously, that was great math you did there, and a neat way | to think about volume. That's a standard shipping container | [0], which is less than I thought it would be. | | [0] - | https://www.mobilemodularcontainers.com/products/storage- | con... | tcoppi wrote: | Even assuming you are correct on all these points ASLR is still | an important use case and the effective security of current | 64-bit address spaces is low. | anotherhue wrote: | FYI Such memory tagging has a rich history | https://en.wikipedia.org/wiki/Tagged_architecture | torginus wrote: | And there's another disadvantage to 128-bit pointers - memory | size and alignment. It would follow that each struct field | would become 16 byte-aligned, and pointers would bloat up as | well, leading to even more memory consumption, especially in | languages that favor pointer-heavy structures. | | This was a major counterargument against 64-bit x86, where the | transition came out as a net zero in terms of performance, due | to the hit of larger pointer sizes counterbalanced by ISA | improvements such as more addressable registers. | | Many people in high-performance circles advocate using 32-bit | array indices opposed to pointers, to counteract the cache | pollution effects. | akira2501 wrote: | I figure the cache is going to be your largest disadvantage, | and is the primary reason CPUs don't physically implement all | address bits and why canonical addressing was required to get | all this off the ground in the first place. | mhh__ wrote: | Systems for which pointers are not just integers have come and | gone, sadly. | | Many mainframes had function pointers which were more like a | struct than a pointer. | apaprocki wrote: | Itanium function pointers worked like that, so it could have | been the new normal if IA64 wasn't so crazy on the whole. | xxpor wrote: | A Raymond Chen post about the function pointer craziness: h | ttps://devblogs.microsoft.com/oldnewthing/20150731-00/?p=90 | ... | masklinn wrote: | Technically that's still pretty common on the software side, | that's what tagged pointers are. | | The ObjC/Swift runtime uses that for instance, the class | pointer of an object also contains the refcount and a few | flags. | mhh__ wrote: | Pointer tagging is still spiritually an integer (i.e. mask | off the bits you guarantee the tags are in via alignment) | versus (say) the pointer being multiple addresses into | different bits of different types of memory _and_ tags | insert other stuff here. | masklinn wrote: | > Pointer tagging is still spiritually an integer | | No, pointer tagging is spiritually a packed structure. | That can be a simple union (as it is in ocaml IIRC) but | it can be a lot more e.g. the objc "non-pointer isa" | ended up with 5 flags (excluding raw isa discriminant) | and two additional non-pointer members of 19 and 9 bits. | | You mask things on and off to unpack the structure into | component the system will accept. Nothing precludes using | tagged pointers to discriminate between kinds and | locations. | TheCondor wrote: | It's very difficult to see normal computers that normal people | use needing it any time soon, I agree. Frontier has 9.2PB of | memory though, so that's 50bits for a petabyte and then 4 more | bits, 54bits of memory addressability if we wanted to byte | address it all. Looking at it that way, if super computers | continue to be funded and grow like they have, we're getting | shockingly close to 64bits of addressable memory. | | I don't know that that really means we need 128bit, 80 or | 96bits buys a lot of time, but it's probably worth a little bit | of thought. | | I don't know how many of you remember the pre-386 days. It was | an effort to write interesting programs though, 512KB or 640KB | of memory to work with but it was 16bit addressable and so | you're writing code to manage segments and stuff, it's an extra | degree of complexity and a pain to debug. 32bits seemed like a | godsend when it happened. I imagine most of the dorks on here | have ripped a blu-ray or transcoded a video image from | somewhere, it's not super unusual to be dealing with a single | file that cannot be represented as bytes with a 32bit pointer. | | It's all about cost and value, 64bits is still a staggering | amount of memory but if the protein folding problems and | climate models and what have you need 80bits of memory to | represent the problem space, I would hope that the people | building those don't also have to worry about the memory "shoe | boxing" problems of yesteryear too. | aidenn0 wrote: | > ...which is the sort of address space you might need for | byte-level addressing of a warehouse full of high-density HDDs | | So if you want to mmap() files stored in your datacenter | warehouse, maybe you do need it? | lostmsu wrote: | So 64 bit address is only 1024 16TB HDDs? That number may go | down quickly. There is a 100TB SSD already. | Beltiras wrote: | As my siblings are pointing out your error they are not | addressing what you are saying. You are absolutely correct. | It's conceivable within a few years to fit this much storage | in one device (might be a full rack full of disks but still). | ijlx wrote: | 1024*1024 16TB HDDs, or 1,048,576. | oofbey wrote: | 1,000,000 drives at 16 TB each I think. | | Kilo is 10 bits. Mega 20. Gigs 30. Tera 40. 16TB is 44 bits. | 1000* is another 10 bits so 54. | infinityio wrote: | Not quite - 1024GiB is 1TiB, so it's 1024 x 1024 x 16TiB | drives | lostmsu wrote: | Ah, parent already corrected their mistake. The comment I was | responding to was saying 16*1024*1024*1GB. | mort96 wrote: | We couldn't introduce a new 'middle' keyword, but could we say | 'int' is 32 bit, 'long' is 128 bit and 'short long' is 64 | bit..? | MikeHalcrow wrote: | I recall sitting in a packed room with over a hundred devs at the | 2004 Ottawa Linux Symposium while the topic of the number of | filesystem bits was being discussed (link: | https://www.linux.com/news/ottawa-linux-symposium-day-2/). I | recall people throwing out questions as to why we weren't just | jumping to 128 or 256 bits, and at one point someone blurted out | something about 1024 bits. Someone then made a comment about the | number of atoms in the universe, everyone chuckled, and the | discussion moved on. I sensed the feeling in the room was that | any talk of 128 bits or more was simply ridiculous. Mind you this | was for storage. | | Fast-forward 18 years, and it's fascinating to me to see people | now seriously floating the proposal to support 256-bit pointers. | Blikkentrekker wrote: | > _How would this look in the kernel? Wilcox had originally | thought that, on a 128-bit system, an int should be 32 bits, long | would be 64 bits, and both long long and pointer types would be | 128 bits. But that runs afoul of deeply rooted assumptions in the | kernel that long has the same size as the CPU 's registers, and | that long can also hold a pointer value. The conclusion is that | long must be a 128-bit type._ | | Can anyone explain the rationale for not simply naming types | after their size? In many programming languages, rather than this | arcane terminology, "i16", "i32", "i64", and "i128" simpy exist. | creativemonkeys wrote: | I'm sure someone will come along and explain why I have no idea | what I'm talking about, but so far my understanding is those | names exist because of the difference in CPU word size. | Typically "int" represents the natural word size for that CPU, | which matches the register size as well, so 'int plus int' is | as fast as addition can run by default, on a variety of CPUs. | That's one reason chars and shorts are promoted to ints | automatically in C. | | Let's say you want to work with numbers and you want your | program to run as fast as possible. If you specify the number | of bits you want, like i32, then the compiler must make sure on | 64bit CPUs, where the register holding this value has an extra | 32bits available, that the extra bits are not garbage and | cannot influence a subsequent operation (like signed right | shift), so the compiler might be forced to insert an | instruction to clear the upper 32bits, and you end up with 2 | instructions for a single operation, meaning that your code now | runs slower on that machine. | | However, had you used 'int' in your code, the compiler would | have chosen to represent those values with a 64bit data type on | 64bit machines, and 32bit data type on 32bit machines, and your | code would run optimally, regardless of the CPU. This of course | means it's up to you to make sure that whatever values your | program handles fit in 32bit data types, and sometimes that's | difficult to guarantee. | | If you decide to have your cake and eat it too by saying "fine, | I'll just select i32 or i64 at compile time with a condition" | and you add some alias, like "word" -> either i32 or i64, "half | word" -> either i16 or i32, etc depending on the target CPU, | then congrats, you've just reinvented 'int', 'short', 'long', | et.al. | | Personally, I'm finding it useful to use fixed integer sizes | (e.g. int32_t) when writing and reading binary files, to be | able to know how many bytes of data to read when loading the | file, but once those values are read, I cast them to (int) so | that the rest of the program can use the values optimally | regardless of the CPU the program is running on. | nicoburns wrote: | That explains "int", but it doesn't explain short or long or | long long. Rust has "usize" for the "int" case, and then | fixed sizes for everything else, which works much better. If | you want portable software, it's usually more important to | know how many bits you have available for your calculation | than it is to know how efficiently that calculation will | happen. | masklinn wrote: | Legacy, there's lots of dumb stuff in C. As you note, in modern | languages the rule is generally to have fixed-size integers. | | Though I think there are portability issues concerns, that | world is mostly gone (it remains in some corners of computing | e.g. dsps) but if you're only using fixed-size integers what do | you do when a platform doesn't have that size? With a more | flexible scheme, you have less issues there, however as the | weirdness landscape contracts the risk of making technically | incorrect assumptions (about relations between type sizes, or | the actual limits and behaviour of a given type) start | increasing dramatically. | | Finally there's the issue at hand here: even with fixed-size | integers, "pointer" is a variable-size datum. So you still need | a variable-size integer to go with it. C historically lacking | that (nowadays it's called uintptr_t), the kernel made | assumptions which are incorrect. | | Note that you can still get it wrong even if you try e.g. Rust | believes and generally assumes that usize and pointers | correspond, but that gets iffy with concepts like pointer | provenance, which decouple pointer size and address space. | mpweiher wrote: | > in modern languages the rule is generally to have fixed- | size integers. | | _Modern_ languages have unlimited size integers :-) | | "Modern" as in "since at least the 80s, more likely 70s". | fluoridation wrote: | Good luck using those to specify the data layout of a | network packet. | mpweiher wrote: | Well, for that you'd probably use a specialisation of | Integer that's bounded and can thus be represented in a | machine word. | fluoridation wrote: | And then you'll be wasting time marshaling data between | the stream and your objects because they're not PODs and | so you can't just memcpy() onto them. | josefx wrote: | You are supposed to stream your video data as base64 | encoded xml embedded in a json array. | kuratkull wrote: | Good luck seeing your performance drop off a very sharp | cliff if you start using larger numbers than your CPU can | fit into a single register. | mpweiher wrote: | Well, in those case other languages fail. | | Either silently with overflows, usually leading to | security exploits, or by crashing. | | So in either case you are betting that these cases are | somewhere between rare and non-existent, particularly for | your core/performance intensive code. | | Being somewhat slower, probably in very isolated contexts | (60-62 bits is quite a bit to overflow), but always | correct seems like the better tradeoff. | | YMMV. -\\_(tsu)_/- | raverbashing wrote: | > Legacy, there's lots of dumb stuff in C. | | Yes, this, so much this | | Who cares what an 'int' or a 'long' is. Except for things | like the size of a pointer, it's better if you know exactly | what you're working with. | PaulHoule wrote: | C dates back to a time when the 8 bit byte didn't have 100% | market share. | yetihehe wrote: | Plus, it was a language to write systems, where "size of | register on current machine" was a nice shortcut for "int", | where registers could be anywhere from 8-32 bits, with 48 or | 12 also a possibility. | masklinn wrote: | Except that's not been true in a while, and technically | this assumptions was not kosher for even longer: C itself | only guarantees that int is 16 bits. | cestith wrote: | I have a couple of 12-bit machines upstairs. There were | also 36-bit systems once upon a time. | mhh__ wrote: | C still (I think, C23 may have finally killed support*) | supports architectures like clearpath mainframes which have a | 36 bit word, 9 bit byte, 36 (IIRC) bit data pointer and a 81 | bit function pointer. | | The changes to the arithmetic rules mean you can't have sign- | magnitude or 1s complement anymore IIRC | wongarsu wrote: | That's pretty much the mentioned proposal of "just use rust | types", which are i16/u16 to i128/u128, plus usize/isize for | pointer-sized things. | | The only improvement that you really need over that is to | differentiate between what c calls size_t and uintptr_t: the | size of the largest possible array, and the size of a pointer. | On "normal" architectures they're the same, but on | architectures that do pointer tagging or segmented memory a | pointer might be bigger than the biggest possible array. | | But you still have to deal with legacy C code, and C was dreamt | up when running code written for 16 bits on a 14 bit | architecture without losing speed was a consideration, so the C | type's are weird. | thrown_22 wrote: | stdint.h has been around far longer than Rust. | | I've been using those since the 00s for bit banging code | where I need guarantees for where each bit goes. | | Nothing quite like working with a micro processor with 12bit | words to make you appreciate 2^n addresses. | quonn wrote: | I think that's because of portability. So that the common types | just map to the correct size on a given system. | Stamp01 wrote: | C99 specifies stdint.h/inttypes.h as part of the standard | library for exactly this purpose. I'd expect using it would be | a best practice at this point. But I'm no C expert, so maybe | there's a good reason for not always using those explicitly | sized types. | pjmlp wrote: | Windows has macros for that kind of stuff, and only in C99 the | stdint header came to be. | | So you had almost three decades with everyone coming up with | their own solution. | | To be fair, the other languages were hardly any better than C | in this regard. | m0RRSIYB0Zq8MgL wrote: | That is what was suggested in the next paragraph. | | > But a better solution might just be to switch to Rust types, | where i32 is a 32-bit, signed integer, while u128 would be | unsigned and 128 bits. This convention is close to what the | kernel uses already internally, though a switch from "s" to "i" | for signed types would be necessary. Rust has all the types we | need, he said, it would be best to just switch to them. | [deleted] | PaulDavisThe1st wrote: | I remember a quote from the papers about Opal, an experimental OS | that was intended to use h/w protection rather than virtual | memory, so that all processes share the same address space and | can just exchange pointers to share data. | | "A 64 bit memory space is large enough that if a process | allocated 1MB every second, it could continue doing this until | significantly past the expected lifetime of the sun before it ran | into problems" | torginus wrote: | On a bit tangential note, RAM price for a given cost used to | increase exponentially until the 2010s or so. | | Since then, it only roughly halved. What happened? | | https://jcmit.net/memoryprice.htm | | I know it's not process geometry, since we went from 45nm->5nm in | the time, a roughly 81x decrease. | | Is is realistic to assume scaling will resume? | hnuser123456 wrote: | We decided to slow down giving programmers excuses to make chat | applications as heavy as web browsers | jaimehrubiks wrote: | dredmorbius wrote: | _As long as the posting of subscriber links in places like this | is occasional, I believe it serves as good marketing for LWN - | indeed, every now and then, I even do it myself. We just hope | that people realize that we run nine feature articles every | week, all of which are instantly accessible to LWN | subscribers._ | | -- Jonathan Corbet, LWN founder & and grumpy editor in chief | | <https://news.ycombinator.com/item?id=1966033> | | Multiple other approvals: <https://hn.algolia.com/?dateRange=al | l&page=0&prefix=false&qu...> | | Jon's own submissions: | <https://news.ycombinator.com/submitted?id=corbet> | | And if we look for SubscriberLink submissions with significant | (>20 comments) discussion ... they're showing up every few | weeks, largely as Jon had requested. | | <https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.. | .> | | That said, those who are able to comfortably subscribe and find | this information useful: please _do_ support the site through | subscriptions. | nequo wrote: | Whether the posting of subscriber links is "occasional" as of | late is debatable.[1] Most of LWN's paywalled content is | posted on HN. | | [1] https://news.ycombinator.com/item?id=32926853 | dredmorbius wrote: | jaimehrubiks stated unequivocally without substantiation | that "Somebody asked before to please not share lwn's | SubscriberLinks". LWN's founder & editor has repeatedly | stated otherwise, _hasn 't_ criticised the practice, and | participates in the practice himself, as recently as three | months ago. | | SubscriberLinks are tracked by the LWN account sharing | them. Abuse can be managed through LWN directly should that | become an issue. Whether or not that's occurred in the past | I've no idea, but the capability still exists and is | permitted. | | No link substantiating jamiehrubiks' assertion seems to | have been supplied yet. | | I'm going to take Corbet's authority on this. | nequo wrote: | Corbet repeatedly used the word "occasionally," sometimes | even with emphasis. | | What I'm saying is that the current situation is that | most of the for-pay content of LWN is available on HN | which is at odds either with his wish that it be | occasional or with my understanding of English. | dredmorbius wrote: | Most of those submissions die in the queue. | | I'd set a 20-comment limit to the search I presented for | a reason. At present, the 30 results shown go back over 7 | months. That's roughly a significant submission per week. | | Contrasting a search for "lwn.net" alone in submissions, | the first page of results (sorted by date, again, 30 | results) only goes back 3 weeks (22 days). But most of | those get little activity --- some upvotes, and a few | with many comments, but, in a third search sorted by | popularity over the past month, | | <https://hn.algolia.com/?dateRange=pastMonth&page=0&prefi | x=tr...> | | Ten of those meet or beat my 20-comment threshold, 20 | don't. And note that 20 comments isn't especially | significant, 4 submissions exceed 100 comments. | | lwn SubscriberLink & > 20 comments, by date: <https://hn. | algolia.com/?dateRange=all&page=0&prefix=true&que...> | | All "lwn.net" for past month: <https://hn.algolia.com/?da | teRange=pastMonth&page=0&prefix=tr...> | | Data: | | Comments: 189 94 155 254 153 29 46 10 14 20 13 12 89 10 | 21 1 8 0 1 1 0 2 0 0 0 0 2 1 1 0 | | Points: 306 271 254 240 166 114 109 89 62 58 53 45 42 39 | 37 30 20 7 5 5 5 4 4 4 4 4 3 3 3 3 | | I'm not saying that the concern doesn't exist. But | ultimately, it's LWN's to address. The constant | admonishments to _not_ share links seem to fall into | tangential annoyances and generic tangents, both against | HN guidelines: | <https://news.ycombinator.com/newsguidelines.html> | | I'd suggest leaving this to Corbet and dang. | munro wrote: | There was a post awhile back from NASA saying how many digits of | Pi they actually need [1]. import math | pi = 314159265358979323846264338327950288419716939937510582097494 | 45923078164062862089986280348253421170679821480865132823066470938 | 44609550582231725359408128481117450284102701938521105559644622948 | 95493038196442881097566593344612847564823378678316527120190914564 | 85669234603486104543266482133936072602491412737245870066063155881 | 74881520920962829254091715364367892590360 sign_bits | = 1 sig_bits = math.ceil(math.log2(pi)) exp_bits | = math.floor(math.log2(sig_bits)) assert sign_bits + | sig_bits + exp_bits == 1209 | | I'm sure I got something wrong here, def off-by-one, but roughly | it looks like it would need 1209-bit floats (2048-bit rounded | up!). IDK, mildly interesting. :> | | [1] https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many- | decimal... | PaulDavisThe1st wrote: | The size of required data types is mostly orthogonal to the | size of memory addresses or filesystem offsets. | jabl wrote: | Pi is a bit special because in order to get accurate argument | reduction for trigonometric functions you needs lots of digits | (IIRC ~1000 for double precision). | | E.g. https://redirect.cs.umbc.edu/~phatak/645/supl/Ng- | ArgReductio... | vanderZwan wrote: | The value of Pi you mention was the one in the question, the | one in the answer is: | | > _For JPL 's highest accuracy calculations, which are for | interplanetary navigation, we use 3.141592653589793. Let's look | at this a little more closely to understand why we don't use | more decimal places. I think we can even see that there are no | physically realistic calculations scientists ever perform for | which it is necessary to include nearly as many decimal points | as you present._ | | That's sixteen digits, so a quick trip to the dev tools tels | me:: >> Math.log2(3141592653589793) | -> 51.480417552782754 | | The last statement of the text I quoted is more interesting | though. Although not surprising to me, given how many | astronomers I know who joke that Pi equals three all the time. | munro wrote: | Lol I should RTFA ;D | vanderZwan wrote: | Nah, just claim you were invoking Cunningham's Law ;) | Beltiras wrote: | I have horrid memories of debugging I had to do to get some | god-awful fourier transform to calculate with 15 digits of | precision to fit a spec. It's right at the boundary where | double-precision stops being deterministic. Worst debugging | week of my life. | vanderZwan wrote: | > _stops being deterministic_ | | I'm imagining the maths equivalent of Heisenbugs, is that | correct? | Beltiras wrote: | No, just having to match how Matlab did the calculation | (development of an index) to implementing the _same | thing_ in C++ (necessitating showing the calculation | returned same significant digits for the precision we | expected). I 've seen a Heisenbug and that was really | weird. Happened during uni so I didn't have to start | tracing down compiler bugs. Not even sure if I could, | happened with Java. | rmorey wrote: | this seems just a bit too early - so that probably means it's | exactly the right time! | marktangotango wrote: | I was wondering if this (128bit memory) are on the radar of any | of the BSDs. Will they forever be stuck at 64bit? | fanf2 wrote: | CheriBSD might be the first unix-like with 128 bit pointers | amelius wrote: | Perhaps it's an idea to make Linux parameterized in the | pointer/word size, and let the compiler figure it out in the | future. | bitwize wrote: | Pointers will get fat (CHERI and other tagged pointer schemes) | well before even server users will need to byte-address into more | than 2^64 bytes' worth of stuff. So we should probably be | realistically aiming for _256-bit_ architectures... | t-3 wrote: | Are there operations that vector processors are inherently worse | at or much harder to program for? Nowadays they seem to be mainly | used for specialized tasks like graphics and machine learning | accelerators, but given the expansion of SIMD instruction sets, | are general purpose vector CPUs in the pipeline anywhere? ___________________________________________________________________ (page generated 2022-09-23 23:00 UTC)