[HN Gopher] The road to Zettalinux
       ___________________________________________________________________
        
       The road to Zettalinux
        
       Author : rwmj
       Score  : 265 points
       Date   : 2022-09-23 12:28 UTC (10 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | t3estabc wrote:
        
         | wdutch wrote:
         | Hi GPT3, I didn't expect to see you here.
        
         | Majestic121 wrote:
         | Why are you posting GPT3 responses here?
        
       | LinkLink wrote:
       | For reference 2^64 = ~10^19.266 I don't think this is
       | unreasonable at all, its unlikely that computers will largely
       | stay the same in the coming years. I believe we'll see many
       | changes to how things like mass addressing of data and computing
       | resources is done. Right now our limitations in these regards are
       | addressed by distributed computing and databases, but in a hyper-
       | connected world there may come a time when such huge address
       | space could actually be used.
       | 
       | It's an unlikely hypothetical but imagine if fiber ran
       | everywhere, and all computers seamlessly worked together sharing
       | computer power as needed. Even 256 bits wouldn't be out of the
       | question then. And before you say something like that will never
       | happen consider trying to convince somebody from 2009 that in 13
       | years people would be buying internet money backed by nothing.
        
         | est31 wrote:
         | The extra bits might be used for different things like e.g. in
         | CHERI. The address space is still 64 bits, but there are 64
         | bits in metadata added to it, so you get a 128 bit
         | architecture.
        
         | Tsarbomb wrote:
         | Not agreeing or disagreeing for the most part, but in 2009 all
         | of the prerequisite technology for cryptocurrency existed:
         | general purpose computers for the average person, accessible
         | internet, cryptographic algorithms and methods, and cheap
         | storage.
         | 
         | For 256 bit computers, we need entirely new CPU architectures
         | and updated ISAs for not just x86/AMD64, but for other archs
         | increasing in popularity such as ARM and even RISC-V. Even then
         | compilers, build tools, and dependant devices with their
         | drivers need updates too. On top of all of this technical work,
         | you have the political work of getting people to agree on new
         | standards and methods.
        
           | mhh__ wrote:
           | 256 bits in the case of a worldwide mega-computer would be
           | such a huge departure from current architectures and more
           | importantly latency-numbers that we can barely even speculate
           | about it.
           | 
           | It may be of note that hypothetically one can have a soft-ISA
           | 128 bit virtual address (a particularly virtual virtual
           | address) which is JITed down into a narrower physical address
           | by the operating system. This is as far as I'm aware how IBM
           | i works.
        
         | masklinn wrote:
         | I don't know it seems excessive to me. I could see the cold
         | storage maybe, with spanning storage pools (by my reckoning
         | there were 10TB drives in 2016 and the largest now are 20, so
         | 16 years from now should be 320 if it keeps doubling, which is
         | 5 orders of magnitude below still).
         | 
         | > Right now our limitations in these regards are addressed by
         | distributed computing and databases, but in a hyper-connected
         | world there may come a time when such huge address space could
         | actually be used.
         | 
         | Used at the core of the OS itself? How do you propose to beat
         | the speed of light exactly?
         | 
         | Because you don't need a zettabyte-compatible kernel to run a
         | distributed database (or even file system, see ZFS), trying to
         | DMA things on the other side of the planet sounds like the
         | worst possible experience.
         | 
         | Hell, our current computers right now are not even close to 64
         | bit address spaces. The baseline is 48 bits, and x86 and ARM
         | are in the process of extending the address space (to 57 bits
         | for x86, and 52 for ARM).
        
           | alain94040 wrote:
           | Thanks to Moore's law, you can assume that DRAM capacity will
           | double every 1-3 years. Every time it doubles, you need one
           | more bit. So if we use 48 bits today, we have 16 bits left to
           | grow, which gives us at least 16 years of margin, and maybe
           | 48 years. (and it could be even longer if you believe that
           | Moore's law is going to keep slowing down).
        
         | Deukhoofd wrote:
         | > all computers seamlessly worked together sharing computer
         | power as needed. Even 256 bits wouldn't be out of the question
         | then.
         | 
         | This sounds like it would be massively out of scope for Linux.
         | It'd require a complete overhaul of most of its core
         | functionality, and all of its syscalls. While not a completely
         | infeasible idea, it sounds to me like it'd require a completely
         | new designed kernel.
        
         | jmillikin wrote:
         | > It's an unlikely hypothetical but imagine if fiber ran
         | everywhere,       > and all computers seamlessly worked
         | together sharing computer power       > as needed. Even 256
         | bits wouldn't be out of the question then.
         | 
         | You could do this today with 196 bits (128-bit IPv6 address,
         | 64-bit local pointer). Take a look at RDMA, which could be
         | summarized as "every computer's RAM might be any computer's
         | RAM".
         | 
         | The question is whether such an address makes sense for the
         | Linux kernel. If your hyper-converged distributed program wants
         | to call `read()`, does the pointer to the buffer really need to
         | be able to identify any machine in the world? Maybe it's enough
         | for the kernel to use 64-bit local pointers only, and have a
         | different address mechanism for remote storage.
        
       | krackout wrote:
       | I think the article is very shortsighted. By 2035-40 we'll
       | probably have memory only (RAM) computers massively available. No
       | disks means no current OS capable of handling these computers. A
       | change of paradigm needing new platforms and OSes.
       | 
       | These future OSes may be 128bit, but I don't think the current
       | ones will make it to the transition.
        
         | alpb wrote:
         | There are plenty of OSes today capable of booting and running
         | from RAM. Pretty sure we wouldn't be burning all the prominent
         | OSes for something like that.
        
       | zasdffaa wrote:
       | Sounds nuts. Does anyone know how much power a 32GB DIMM draws?
       | How much would a fully populated 64-bit address space therefore
       | pull?
       | 
       | Edit, if a 4GB (32-bits used) DRAMM pulls 1 watt, the rest of the
       | memory space is 32 bit = 4E9 so your memory is pulling ~4Gwatts
       | alone. That's not supportable, given the other electronics needed
       | to go around it.
        
       | Beltiras wrote:
       | I would think by now any bunch of clever people would be trying
       | to fix the generalized problem of supporting n-bit memory
       | addressing instead of continually solving the single problem of
       | "how do we go from n*2 to (n+1)*2". I guess it's more practical
       | to just let the next generation of kernel maintainers go through
       | all of this hullabaloo again in 2090.
        
       | tomcam wrote:
       | Just wanted to say I love this discussion. Have been pondering
       | the need for a 128-bit OS for decades but several of the issues
       | raised were completely novel to me. Fantastic to have so many
       | people so much smarter than I am hash it out informally here.
       | Feels like a master class.
        
       | znpy wrote:
       | I wonder if with 128-bit wide pointers it would make sense to
       | start using early-lisp-style tagged pointers.
        
       | Aqueous wrote:
       | If we're going to go for 128- why not just go for 256-? that way
       | we won't have to do this again for a while.
       | 
       | or better yet, design a new abstraction for not having to hard-
       | code the limit of the pointer size but instead allow it to be
       | extensible as more addressable space becomes a reality, instead
       | of having to transition over and over. is this even possible? if
       | it is, shouldn't we head in that direction?
        
         | kmeisthax wrote:
         | The problem with variable-sized pointers is that...
         | 
         | 1. Any abstraction you could make will have worse performance
         | than a fixed-size machine pointer
         | 
         | 2. In order to support _any kind_ of variably-sized type you
         | need machine pointers to begin with, and those will always be
         | fixed-size because variable size is even harder to support in
         | hardware than native code
         | 
         | And furthermore going straight to 256 has its own problems.
         | Each time you double the pointer size you also significantly
         | increase the size of structures with a lot of pointers. V8
         | notably uses "pointer compression" - i.e. using 32-bit offsets
         | instead of 64-bit pointers, because it never needs >4GB of
         | JavaScript objects at once and JS objects are _very_ pointer-
         | ridden.
         | 
         | There's two forces at play here: pointers need to be small
         | enough to embed in any data structure and large enough to
         | address the entire working set of the program. Larger pointers
         | are not inherently better[0], and neither are smaller pointers.
         | It's a balancing act.
         | 
         | [0] ASLR, PAC, and CHERI are exceptions, as mentioned in the
         | original article.
        
       | tayistay wrote:
       | Is 128 bit the limit of what we would need? We use 128 bit UUIDs.
       | 2^256 seems to be more than the number of atoms on Earth.
        
         | wongarsu wrote:
         | The article does talk about just making the pointer type used
         | in syscalls 256 bit wide to "give room for any surprising
         | future needs".
         | 
         | The size of large networked disk arrays will grow beyond 64 bit
         | addresses, but I don't think we will exceed 2^128 bits of
         | storage of any size, for any practical application. Then again,
         | there's probably people who thought the same about 32 bit
         | addresses when we moved from 16bit to 32bit addresses.
         | 
         | The most likely case for "giant" pointers (more than 128 bits)
         | will be adding more metadata into the pointer. With time we
         | might find enough use cases that are worth it to go to 256bit
         | pointers, with 96bit address and 160 bit metadata or something
         | like that.
        
           | dylan604 wrote:
           | >Then again, there's probably people who thought the same
           | about 32 bit addresses when we moved from 16bit to 32bit
           | addresses.
           | 
           | There's a fun "quote" about 384k being all anyone would ever
           | need, so clearly everyone just needs to settle down and
           | figure out how to refactor their code.
        
             | shadowofneptune wrote:
             | The IBM PC's 20-bit addressing was 16 times the size of
             | 16-bit addresses. From 20-bit to 32-bit, 4096 times larger.
             | 32 to 64 is 4,294,967,296 times larger (!). The scale alone
             | makes using all this space unlikely on a PC.
        
       | jupp0r wrote:
       | "The problem now is that there is no 64-bit type in the mix. One
       | solution might be to "ask the compiler folks" to provide a
       | __int64_t type. But a better solution might just be to switch to
       | Rust types, where i32 is a 32-bit, signed integer, while u128
       | would be unsigned and 128 bits. This convention is close to what
       | the kernel uses already internally, though a switch from "s" to
       | "i" for signed types would be necessary. Rust has all the types
       | we need, he said, it would be best to just switch to them."
       | 
       | Does anybody know why they don't use the existing fixed size
       | integer types [1] from C99 ie uint64_t etc and define a 128 bit
       | wide type on top of that (which will also be there in C23 IIRC)?
       | 
       | My own kernel dev experience is pretty rusty at this point (pun
       | intended), but in the last decade of writing cross platform
       | (desktop, mobile) userland C++ code I advocated exclusively for
       | using fixed width types (std::uint32_t etc) as well as constants
       | (UINT32_MAX etc).
        
       | ghoward wrote:
       | It would be sad if we, as an industry, do not take this
       | opportunity to create a better OS.
       | 
       | First, we should decide whether to have a microkernel or a
       | monolithic kernel.
       | 
       | I think the answer is obvious: microkernel. This is much safer,
       | and seL4 has shown that performance need not suffer too much.
       | 
       | Next, we should start by acknowledging the chicken-and-egg
       | problem, especially with drivers. We will need drivers.
       | 
       | So let's reuse Linux drivers by implementing a library for them
       | to run in userspace. This should be difficult, but not
       | impossible, and the rewards would be _massive_ , basically
       | deleting the chicken-and-egg problem for drivers.
       | 
       | To solve the userspace chicken-and-egg problem (having
       | applications that run on the OS), implement a POSIX API on top of
       | the OS. Yes, this will mean that some bad legacy like `fork()`
       | will exist, but it will solve that chicken-and-egg problem.
       | 
       | From there, it's a simple matter of deciding what the best design
       | is.
       | 
       | I believe it would be three things:
       | 
       | 1. Acknowledging hardware as in [1].
       | 
       | 2. A copy-on-write filesystem with a transactional API (maybe a
       | modified ZFS or BtrFS).
       | 
       | 3. A uniform event API like Windows' handles and Wait() functions
       | or Plan 9's file descriptors.
       | 
       | For number 3, note that not everything has to be a file, but
       | receiving events like signals and events from child processes
       | should be waitable, like in Windows or Linux's signalfd and
       | pidfd.
       | 
       | For number 2, this would make programming _so_ much easier on
       | everybody, including kernel and filesystem devs. And I may be
       | wrong, but it seems like it would not be hard to implement. When
       | doing copy-on-write, just copy as usual, and update the root
       | B-tree node; the transaction commits when the root B-tree node is
       | flushed to disk, and the flush succeeds.
       | 
       | (Of course, this would also require disks that don't lie, but
       | that's another problem.)
       | 
       | [1]: https://www.usenix.org/conference/osdi21/presentation/fri-
       | ke...
        
       | gwbas1c wrote:
       | > Matthew Wilcox took the stage to make the point that 64 bits
       | may turn out to be too few -- and sooner than we think
       | 
       | Let's think critically for a moment. I grew up in the 1980s and
       | 1990s, when we all craved more and more powerful computers. I
       | even remember the years when each generation of video games was
       | marketed as 8-bit, 16-bit, 32-bit, ect.
       | 
       | BUT: We're hitting a point where, for what we use computers for,
       | they're powerful enough. I don't think I'll ever need to carry a
       | 128-bit phone in my pocket, nor do I think I'll need a 128-bit
       | web browser, nor do I think I'll need a 128-bit web server. (See
       | other posts about how 64-bits can address massive amounts of
       | memory.)
       | 
       | Will we need 128-bit computing? I'm sure _someone_ will find a
       | need. But let 's not assume they'll need an operating system
       | designed in the 1990s for use cases that we can't imagine today.
        
         | mikepurvis wrote:
         | IMO there's an important distinction to be made between high-
         | bit _addressing_ and high-bit _computing_.
         | 
         | Like, no one has enough memory to need more than 64 bits for
         | addressing, and that is likely to remain the case for the
         | foreseeable future. However, 128- and 256-bit values are
         | commonly used in domains like graphics, audio, and so-on, where
         | you need to apply long chains of transformations and filters,
         | but retain as much of the underlying dynamic range as possible.
        
           | PaulDavisThe1st wrote:
           | Those are data types, not pointer values or filesystem
           | offsets. Totally different thing.
        
             | mikepurvis wrote:
             | Is that strictly the definition, that the bit-width of a
             | processor refers only to its native pointer size?
             | 
             | I know it's hardly a typical or modern example, but the N64
             | had just 4MB of memory (8MB with the expansion pack). It
             | most certainly didn't need 64-bit pointers to address that
             | pittance, so it was a "64 bit processor" largely for the
             | purposes of register/data size.
        
               | PaulDavisThe1st wrote:
               | Not so much a definition as where the problem lies.
               | 
               | If the thing that can refer to a memory address changes
               | size, there are very different problems than will arise
               | if the size of "an integer" changes.
               | 
               | You could easily imagine a processor that can only
               | address an N-bit address space, but can trivially do
               | arithmetic on N*M bit integers or floating point values.
               | And obviously the other way around, too.
               | 
               | In general, I think "N bit processor" tends to refer to
               | the data type sizing, but since those primitive data
               | types will tend to fit into the same registers that are
               | used to hold pointers, it ends up describing addressing
               | too.
        
         | uluyol wrote:
         | Supercomputers? Rack-scale computing? See some of the work
         | being done with RDMA and "far memory".
        
           | gwbas1c wrote:
           | Yes... And do you need that in the same kernel that goes into
           | your phone and your web server?
        
         | jerf wrote:
         | I think the argument for 128-bits being necessary strictly for
         | routing memory in a single "computer" is fairly weak. We're
         | already seeing the plateauing of memory sizes nowadays; what
         | was exponentially in reach within just a couple of decades
         | exponentially recedes from us as our progress goes from
         | exponential to polynomial.
         | 
         | But the argument that we need more than 64 bit capability for a
         | lot of other reasons _in conjunction_ with memory
         | addressability is I think very strong. A lot of very powerful
         | and safe techniques become available if we can tag pointers
         | with more than a bit squeezed out here and a bit squeezed out
         | there. I could even see hard-coding the CPU to, say, look at 80
         | bits as an address and then us the remaining 48 for tagging of
         | various sorts. There 's precedent, and 80 bits is an awful lot
         | of addressible memory; that's a septillion+ addressible bytes,
         | by the time we need more than that, if we do, our future selves
         | can deal with that. (It is good to look ahead a decade or two
         | and make reasonable preparations, as this article does; it is
         | hubris to start trying to look ahead 50 years or a century.)
        
         | wongarsu wrote:
         | We are talking about the OS kernel that supports 4096 CPU
         | cores. The companies who pay their engineers to do linux kernel
         | development tend to be the same ones that have absurd needs.
        
           | p_l wrote:
           | And that's not enough to boot on some platforms that linux
           | runs on (even with amd64 ISA) unless one partitions the
           | computer into smaller cpu counts
        
       | teddyh wrote:
       | Maybe we can finally fix maxint_t to be the largest integer type
       | again.
        
       | tonnydourado wrote:
       | The "In my shoes?" bit was hilarious
        
       | PaulHoule wrote:
       | On one hand The IBM System/38 used 128 bit pointers in the 1970s,
       | despite having a 48 bit physical address bus. These were used to
       | manage persistent objects on disk or network with unique ids a
       | lot like uuids.
       | 
       | On the other hand, filling out a 64 bit address space looks
       | tough. I struggled to find something of the same magnitude of
       | 2^64 and I got 'number of iron atoms in an iron filing', From a
       | nanotechnological point of view a memory bank that size is
       | feasible (fits in a rack at 10,000 atoms per bit) but progress in
       | semiconductors is slowing down. Features are still getting
       | smaller but they aren't getting cheaper anymore.
        
         | [deleted]
        
         | protomyth wrote:
         | The AS/400 and iSeries also use 128-bit pointers. 128-bit would
         | be useful for multiple pointers already in common use such as
         | ZFS and IP6 addresses. I expect it will the last hop for a long
         | time.
        
           | PaulHoule wrote:
           | Those are evolved from the System/38.
        
             | protomyth wrote:
             | Yeah, IBM is one company that shows how to push the models
             | down the road. They do take their legacy seriously.
        
           | EvanAnderson wrote:
           | In the context of the AS/400's single-level store
           | architecture the 128-bit pointers make a lot of sense, too.
        
         | bdn_ wrote:
         | I thought up a few ways to visualize 2^64 unique items:
         | 
         | - You could give every ant on Earth ~920 unique IDs without any
         | collisions
         | 
         | - You could give unique IDs for every brain neuron for all ~215
         | million people in Brazil
         | 
         | - The ocean contains about 20 x (2^64) gallons of water (3.5267
         | x 10^20 gallons total)
         | 
         | - There are between 100-400 billion stars in the Milky Way, so
         | you could assign each star between 46,000,000-184,000,000
         | unique IDs each
         | 
         | - You could assign ~2.5 unique IDs to each grain of sand on
         | Earth
         | 
         | - If every cell of your body contained a city with 500,000
         | people each, every "citizen" of your body could have a unique
         | ID without any collisions
         | 
         | Calculating these figures is actually a lot of fun!
        
           | tonnydourado wrote:
           | Those are great examples
        
           | Eduard wrote:
           | There are only ~368 grains of sand per ant?
        
             | fragmede wrote:
             | Well no. See there's this one ant, Jeff, that's hogging
             | them all for itself, so each ant only gets 50 grains grains
             | of sand.
        
             | ManuelKiessling wrote:
             | No wonder I keep reading about sand shortages.
        
         | lloeki wrote:
         | > filling out a 64 bit address space looks tough. I struggled
         | to find something of the same magnitude of 2^64 and I got
         | 'number of iron atoms in an iron filing'
         | 
         | Reminded me of Jeff Bonwick's answer to the following question
         | about his 'boiling the oceans' quip related to ZFS being a "128
         | bit filesystem":
         | 
         | > 64 bits would have been plenty ... but then you can't talk
         | out of your ass about boiling oceans then, can you?
         | 
         | Sadly his Sun hosted blog was eaten by the migration to Oracle,
         | so thanks to the Internet Archive again:
         | 
         | http://web.archive.org/web/20061111054630/http://blogs.sun.c...
         | 
         | That one dives into some of the "handwaving" a bit:
         | 
         | https://hbfs.wordpress.com/2009/02/10/to-boil-the-oceans/
         | 
         | And that one goes into how much energy it would take to _merely
         | spin enough disks up_ :
         | 
         | https://www.reddit.com/r/DataHoarder/comments/71p8x4/reachin...
        
           | PaulHoule wrote:
           | One absolute limit of computation is that it takes
           | (1/2) kT
           | 
           | of energy to delete one bit of information where k is the
           | Boltzmann constant and T is the temperature. Let T = 300deg K
           | (room temperature)
           | 
           | I multiplied that by 2128, and got 1.41x1018 J of energy. 1
           | ton of TNT is 4.2x1012 J, so that is a 335 kiloton explosion
           | worth of energy just to boot.
           | 
           | That's not impossible, that much heat is extracted from a
           | nuclear reactor in a few months. If you want to go faster you
           | need a bigger system, but a bigger system will be slower
           | because of light speed latency.
           | 
           | (You do better, however, at a lower temperature, say 1deg K
           | but heat extraction gets more difficult at lower temperatures
           | and you spend energy on refrigeration unless you wait long
           | enough for the Universe to grow colder.)
        
             | tuatoru wrote:
             | > 1.41x1018 J of energy.
             | 
             | Used over the course of a year, that is a constant 44.4 GW.
             | Less than Bitcoin uses already
        
             | turtletontine wrote:
             | In fairness, the _need_ for 128-bit addressable systems
             | will be when 64 address bits is not enough. That will be
             | long before people are using 2^128 bytes on one system. So
             | doing the calculation with 2^65 bytes would be a more even
             | handed estimate of the machine that would require this
        
         | tuatoru wrote:
         | > On one hand The IBM System/38 used 128 bit pointers in the
         | 1970s, despite having a 48 bit physical address bus.
         | 
         | And the original processor was 24 bits, then it was upgraded to
         | 36 bits (not a typo: 36 bits), and then to POWER 64 bits.
         | 
         | (When that last happened, it was re-badged AS/400. Later,
         | marketing renamed the AS/400 to iSeries, and then to IBM i,
         | without changing anything significant. Still uses Power CPUs,
         | AFAIK).
         | 
         | For users, upgrades were a slightly longer than usual backup
         | and restore.
         | 
         | What's the hard part here?
        
         | mhh__ wrote:
         | IBM i is still in development and also has a 128 bit pointer.
        
       | wongarsu wrote:
       | What's the average lifespan of a line of kernel code? I imagine
       | by starting this project 12 years before its anticipated use case
       | they can get very far just by requiring that any new code is
       | 128-bit compatible (in addition to doing the broader
       | infrastructure changes needed like fixing the syscall ABI)
        
         | rwmj wrote:
         | _> What 's the average lifespan of a line of kernel code?_
         | 
         | There's a fun tool called "Git of Theseus" which can answer
         | this question! You can see some graphs of Linux code on the web
         | page: https://github.com/erikbern/git-of-theseus
         | 
         | Named after the Ship of Theseus:
         | https://en.wikipedia.org/wiki/Ship_of_Theseus
        
           | forgotpwd16 wrote:
           | There're some more in the presentation article:
           | https://erikbern.com/2016/12/05/the-half-life-of-
           | code.html#:...
           | 
           | A (Linux) kernel line has half-life 6.6 years. The highest of
           | the projects analyzed. The lowest went to Angular with half-
           | life 0.32 years.
        
         | trasz wrote:
         | Not sure about those 12 years - 128-bit registers are already
         | there, and CHERI Morello prototype is at a "physical silicon
         | using this functionality under CheriBSD" stage.
        
       | jmillikin wrote:
       | The section about 128-bit pointers being necessary for expanded
       | memory sizes is unconvincing -- 64 bits provides 16 EiB (16 x
       | 1024 x 1024 x 1024 x 1 GiB), which is the sort of address space
       | you might need for byte-level addressing of a warehouse full of
       | high-density HDDs. Memory sizes don't grow like they used to, and
       | it's difficult to imagine what kind of new physics would let
       | someone fit that many bytes into a machine that's practical to
       | control with a single Linux kernel instance.
       | 
       | CHERI is a much more interesting case, because it expands the
       | definition of what a "pointer" is. Most low-level programmers
       | think of pointers as _just_ an address, but CHERI turns it into a
       | sort of tuple of (address, bounds, permissions) -- every pointer
       | is bounds-checked. The CHERI folks did some cleverness to pack
       | that all into 128 bits, and I believe their demo platform uses
       | 128-bit registers.
       | 
       | The article also touches on the UNIX-y assumption that `long` is
       | pointer-sized. This is well known (and well hated) by anyone that
       | has to port software from UNIX to Windows, where `long` and `int`
       | are the same size, and `long long` is pointer-sized. I'm firmly
       | in the camp of using fixed-size integers but the Linux kernel
       | uses `long` all over the place, and unless they plan to do a mass
       | migration to `intptr_t` it's difficult to imagine a solution that
       | would let the same C code support 32-, 64-, and 128-bit
       | platforms.
       | 
       | (comedy option: 32-bit int, 128-bit long, and 64-bit `unsigned
       | middle`)
       | 
       | The article also mentions Rust types as helpful, but Rust has its
       | own problems with big pointers because they inadvisably merged
       | `size_t`, `ptrdiff_t`, and `intptr_t` into the same type. They're
       | working on adding equivalent symbols to the FFI module[0], but
       | untangling `usize` might not be possible at this point.
       | 
       | [0] https://github.com/rust-lang/rust/issues/88345
        
         | travisgriggs wrote:
         | > (comedy option: 32-bit int, 128-bit long, and 64-bit
         | `unsigned middle`)
         | 
         | Or rather than keep moving the long goalpost, keep long at
         | u64/i64 and add prolong(ed) for 128. Or we could keep long as
         | the "nominal" register value, and introduce "short long" for
         | 64. So many options.
        
         | dragontamer wrote:
         | > it's difficult to imagine what kind of new physics would let
         | someone fit that many bytes into a machine that's practical to
         | control with a single Linux kernel instance.
         | 
         | I nominally agree with most of your post. But I should note
         | that modern systems seem to be moving towards a "one pointer
         | space" for the entire cluster. For example, 8 GPUs + 2 CPUs
         | would share the same virtual memory space (GPU#1 may take one
         | slice, GPU#2 takes another, etc. etc.).
         | 
         | This allows for RDMA (ie: mmap across Ethernet and other
         | networking technologies). If everyone has the same address
         | space, then you can share pointers / graphs between nodes and
         | the underlying routing/ethernet software will be passing the
         | data automatically between all systems. Its actually quite
         | convenient.
         | 
         | I don't know how the supercomputer software works, but I can
         | imagine that 4000 CPUs + 16000 GPUs all sharing the same 64-bit
         | address space.
        
           | _the_inflator wrote:
           | I agree with you.
           | 
           | And seeing datacenter after datacenter shooting up like
           | mushrooms, there might be some sort of abstraction running in
           | this direction, that makes 128bit addresses feasible. At the
           | moment 64bit seems like paging in this sense.
        
           | shadowofneptune wrote:
           | Even so, existing ISAs could address more memory using
           | segmentation. AMD64 has a variant of long mode where the
           | segment registers are re-enabled. For the special programs
           | that need such a large space, far pointers wouldn't be that
           | complicating.
        
           | rwmj wrote:
           | Distributed Shared Memory is a thing, but I'm not sure how
           | widely it is used. I found that it gives you all the
           | coordination problems of threads in symmetric multiprocessing
           | but at a larger scale and with much slower synchronisation.
           | 
           | https://en.wikipedia.org/wiki/Distributed_shared_memory
        
             | dragontamer wrote:
             | https://en.wikipedia.org/wiki/Remote_direct_memory_access
             | 
             | Again, I'm not a supercomputer programmer. But the
             | whitepapers often discuss RDMA.
             | 
             | From my imagination, it sounds like any other "mmap". You,
             | the programmer, just remembers that the mmap'd region is
             | slower (since it is read/write to a Disk, rather than to
             | RAM). Otherwise, you treat it "like RAM" from a programming
             | perspective entirely for convenience sake.
             | 
             | As long as you know my_mmap_region->next = foobar(); is a
             | slow I/O operation pretending to be memory, you're fine.
             | 
             | ---------
             | 
             | Modern systems are converging upon this "single address
             | space" programming model. PCIe 3.0 implements atomic
             | operations and memory barriers, CXL is going to add cache-
             | coherence over a remote / I/O interface. This means that
             | all your memory_barriers / atomics / synchronization can be
             | atomic-operations, and the OS will automatically translate
             | these memory commands into the proper I/O level
             | atomics/barriers to ensure proper synchronization.
             | 
             | This is all very new, only within the past few years. But I
             | think its one of the most exciting things about modern
             | computer design.
             | 
             | Yes, its slow. But its consistent and accurately modeled by
             | all elements in the chain. Atomic-compare-and-swap over
             | RDMA can allow for cache-coherent communications and
             | synchronization over Ethernet, over GPUs, over CPUs, and
             | any other accelerators sharing the same 64-bit memory
             | space. Maybe not quite today, but soon.
             | 
             | This technology already exists for PCIe 3.0 CPU+GPUs
             | synchronization from 8 years ago (Shared Virtual Memory).
             | Its exciting to see it extend out into more I/O devices.
        
               | FuriouslyAdrift wrote:
               | RDMA is used heavily in SMB3 file systems for Microsoft
               | HyperV failover clusters.
        
               | c-linkage wrote:
               | Distributed Memory Access is just another kind of Non-
               | Uniform Memory Access, which is Yet Another Leaky
               | Abstraction. Specifically, if you care about performance
               | _at all_ you now have to worry about _where in RAM your
               | data lives_.
               | 
               | Caring about where in memory your data lives is different
               | from dealing with cache or paging. Programmers have to
               | plan ahead to keep frequently accessed data in fast RAM,
               | and infrequently accessed data in "slow" RAM. You'll
               | probably need special APIs to allocate and manage memory
               | in the different pools, not unlike the Address Windowing
               | Extensions API in Microsoft Windows.
               | 
               | And once you extend "memory" outside the chassis, you'll
               | have to design your application with the expectation that
               | _any memory access could fail_ because a network failure
               | means the memory is no longer accessible.
               | 
               | If you only plan to deploy in a data center then _maybe_
               | you can ignore pointer faults, but that is still a risk,
               | especially if you decide to deploy something like Chaos
               | Monkey to test your fault tolerance.
        
               | zozbot234 wrote:
               | > And once you extend "memory" outside the chassis,
               | you'll have to design your application with the
               | expectation that any memory access could fail because a
               | network failure means the memory is no longer accessible.
               | 
               | You have to deal with these things anyway in any kind of
               | distributed setting. What this kind of location-
               | independence via SSI really buys you is the ability to
               | scale the exact same workloads _down_ to a single cluster
               | or even a single node when feasible, while keeping an
               | efficient shared-memory programming model instead of
               | doing slow explicit message passing. It seems like a
               | pretty big simplification.
        
               | jandrewrogers wrote:
               | I've written code for a few different large-scale SSI
               | architectures. The shared-memory programming model is
               | _less_ efficient than explicit message passing in
               | practice because it is much more difficult to optimize.
               | The underlying infrastructure is essentially converged at
               | this point, so performance mostly comes down to the
               | usability of the programming model.
               | 
               | The marketing for SSI was that it was simple because
               | programmers would not have to learn explicit message
               | passing. Unfortunately, people that buy supercomputers
               | tend to care about performance, so designing a
               | supercomputer that is difficult to optimize misses the
               | point. In real code, the only way to make them perform
               | well was to layer topology-aware message passing on top
               | of the shared memory model. At which point you should've
               | just bought a message passing architecture.
               | 
               | There is only one type of large-scale SSI architecture
               | that is able to somewhat maintain the illusion of uniform
               | shared memory -- hardware latency-hiding e.g. barrel
               | processors. If programmers have difficulty writing
               | scalable code with message passing, then they
               | _definitely_ are going to struggle with this. These
               | systems use a completely foreign programming paradigm
               | that looks deceptively like vanilla C++. Exceptional
               | efficient, and companies design new ones every few years,
               | but without programmers that grok how to write optimal
               | code they aren 't much use.
        
             | sakras wrote:
             | At least at this latest SIGMOD, it felt like everyone and
             | their dog was researching databases in an RDMA
             | environment... so I'd imagine this stuff hasn't peaked in
             | popularity.
        
             | throw10920 wrote:
             | That's what I was going to chime in with - you pay for that
             | extra address width. Binary addition and multiplication
             | latency is super-linear with regards to operand width.
             | Larger pointers lead to more memory use, and memory access
             | latency is non-constant with respect to size.
             | 
             | It _might_ make sense for large distributed systems to move
             | to a 128-bit architecture, but I don 't see any reason for
             | consumer devices, at least with current technology.
        
               | dragontamer wrote:
               | > Binary addition ... is super-linear with regards to
               | operand width
               | 
               | No its not. That's why Kogge-Stone's carry lookahead
               | adder was such an amazing result. O(log(n)) latency with
               | respect to operand width with O(n) total half-adders
               | used.
               | 
               | It may seem like its super-linear. But the power of
               | prefix-sums leads to a spectacular and elegant solution.
               | Kogge-stone (and the concept of prefix-sums) is one of
               | the most important parallel-programming / parallel-system
               | results of the last 50 years. Dare I say it, its _THE_
               | most important parallel programming concept.
               | 
               | > multiplication latency
               | 
               | You could just... not implement 128-bit multiplication.
               | Just support 128-bit pointers (aka: addition) and leave
               | multiplication for 64-bits and below.
        
               | throw10920 wrote:
               | You're right, binary addition isn't super-linear. It is
               | non-constant, though, which is a slightly surprising
               | result if you don't know much about hardware.
        
               | dragontamer wrote:
               | Kogge-stone's O(Log2(n)) latency complexity might as well
               | be constant. The difference between 64-bit and 128-bit is
               | the difference between 6 and 7.
               | 
               | There's going to be no issues implementing a 128-bit
               | adder. None at all.
        
               | pclmulqdq wrote:
               | Multiplication of 128 bit numbers is also not a big
               | issue. Today you can do it with 4 MULs and some addition,
               | and it would still be faster than 64-bit division.
               | Hardware multiplication of 128-bit numbers would have
               | area problems more than speed problems. You could always
               | throw out the top 128 bits of the result (since 128x128
               | has a 256 bit result), and the circuit wouldn't be much
               | bigger than a full 64 bit multiplier.
        
               | meisanother wrote:
               | There's also something beautiful about seeing or creating
               | a Kogge-Stone implementation on silicon.
               | 
               | I know it was one of the first time I thought to myself:
               | this is not just a straightforward pipeline, yet it all
               | follows such a beautifully geometrical interconnect
               | pattern. Super fast, yet very elegant to layout.
        
               | dragontamer wrote:
               | The original paper is a masterpiece to read as well, if
               | you haven't read it.
               | 
               | "A Parallel Algorithm for the Efficient Solution of a
               | General Class of Recurrence Equation", by Kogge and
               | Stone.
               | 
               | It proves the result for _all_ associative operations
               | (technically, a class slightly larger than associative.
               | Kogge and Stone called this a "semi-associative"
               | operation).
        
               | meisanother wrote:
               | Well, just got it. Thanks for the reference!
               | 
               | A bit sad that 1974 papers are still behind a IEEE
               | paywall...
               | 
               | Edit: Just finished reading it. I have to say that the
               | generalization of 3.2 got a bit over me, but otherwise
               | it's pretty amazing that they could define such a
               | generalization. Intuition for those type of problem is
               | often to proceed one step at a time, N times.
               | 
               | That it is provably doable in log2(N) is great,
               | especially since it allows for a choice of the
               | depth/number of processors you want to use for the
               | problem. Hopefully next time I design a latency-
               | constrained system I remember to look at that article
        
               | dragontamer wrote:
               | > Hopefully next time I design a latency-constrained
               | system I remember to look at that article
               | 
               | Nah. Your next step is to read "Data parallel algorithms"
               | by Hillis and Steele, which starts to show how these
               | principles can be applied to code. (Much higher-level,
               | easier to follow, paper. From ACM too, so its free since
               | its older than 2000)
               | 
               | Then you realize that all you're doing is following the
               | steps towards "Map-reduce" and modern parallel code and
               | just use Map Reduce / NVidia cub::scan / etc. etc. and
               | all the modern stuff that is built from these fundamental
               | concepts.
               | 
               | Kogge and Stone's paper sits at the root of it all
               | though.
        
           | cmrdporcupine wrote:
           | It seems to me that such a memory space could be physically
           | mapped quite large while still presenting 64-bit virtual
           | memory addresses to the local node? How likely is it that any
           | given node would be mapping out more than 2^64 bytes worth of
           | virtual pages?
           | 
           | The VM system could quite simply track the physical addresses
           | as a pair of `u64_t`s or whatever, and present those pages as
           | 64-bit pointers.
           | 
           | It seems in particular you might want to have this anyways,
           | because the actual costs for dealing with such external
           | memories would have to be much higher than local memory.
           | Optimizing access would likely involve complicated cache
           | hierarchies.
           | 
           | I mean, it'd be exciting if we had need for memory space
           | larger than 2^64 but I just find it implausible with current
           | physics and programs? But I'm also getting old.
        
             | rektide wrote:
             | Leaving cluster coherent address space behind - like you
             | say - is doable. But you lose what the parent was saying:
             | 
             | > _If everyone has the same address space, then you can
             | share pointers / graphs between nodes and the underlying
             | routing/ethernet software will be passing the data
             | automatically between all systems. Its actually quite
             | convenient._
        
               | vlovich123 wrote:
               | Let's say you have nodes that have 10 TiB of RAM in them.
               | You then need 1.6M nodes (not CPUs, but actual boxes) to
               | use up 64bits of address space. It seems like the
               | motivation is to continue to enable Top500 machines to
               | scale. This wouldn't be coming to a commercial cloud
               | offering for a long time.
        
               | rektide wrote:
               | Why limit yourself to in-memory storage? I'd definitely
               | assume we have all our storage content memory mapped onto
               | our cluster too, in this world. People have been building
               | exabyte (1M gigabytes) scale datacenters since well
               | before 2010, and 16 exabytes, the current Linux limit
               | according to the most upvoted post here, isn't that much
               | more inconceivable.
               | 
               | Having more space available usually opens up more
               | interesting possibilities. I'm going to rattle off some
               | assorted options. If there's multiple paths to a given
               | bit of data, we could use different addresses to refer to
               | different paths. We could do something like ILA in IPv6,
               | using some of the address as a location identifier:
               | having enough bits for both the location and the identity
               | parts of the address without being too constrained would
               | be helpful. We could use the extra pointer bits for
               | tagged memory or something like CHERI, which allow all
               | kinds of access-control or permission or security
               | capabilities. Perhaps we create something like id's
               | MegaTexture, where we can procedurally generate data on
               | the fly if given an address. There's five options for why
               | you'd want more address space than addressable storage.
               | And I think some folks are already going to be quite
               | limited & have quite a lot of difficulty partitioning up
               | their address space, if they only have for example 1.6m
               | buckets of 1TB (one possible partitioning scheme).
               | 
               | The idea of being able to refer to everything anywhere
               | that does or did exist across a very large space sure
               | seems compelling & interesting to me!
        
               | vlovich123 wrote:
               | Maybe. You are paying a significant performance penalty
               | for ALL compute to provide that abstraction though.
        
               | Aperocky wrote:
               | Sounds like a disaster in terms of potential bugs.
        
               | jerf wrote:
               | It is, but it's the same disaster of bugs we already have
               | from multiple independent cores sharing the same memory
               | space, not a brand new disaster or anything.
               | 
               | It's a disaster of latency issues too, but it's not like
               | that's surprising anyone either, and we already have NUMA
               | on some multi-core systems which is the same problem.
               | 
               | We have existing tools that can be extended in
               | straightforward ways to deal with these issues. And it's
               | not like there's a silver bullet here; having separate
               | address spaces everywhere comes with its own disaster of
               | issues. Pick your poison.
        
               | sidewndr46 wrote:
               | Instead of having another thread improperly manipulating
               | your pointers and scribbling all over memory, now you can
               | have an entire cluster of distributed machines doing it.
               | This is a clear step forward.
        
               | rektide wrote:
               | Not that the industry doesnt broadly deserve this FUD
               | take/takedown, but perhaps possibly maybe it might end up
               | being really good & useful & clean & clear & lead to very
               | high functioning very performant very highly observable
               | systems, for some.
               | 
               | Having a single system image has many potential upsides,
               | and understandability & reasonability are high among
               | them.
        
               | yencabulator wrote:
               | Just because you can refer to the identity of a thing
               | anywhere in the cluster doesn't mean it can't also be
               | memory-safe, capability-based, and just an RPC.
        
             | maxwell86 wrote:
             | > How likely is it that any given node would be mapping out
             | more than 2^64 bytes worth of virtual pages?
             | 
             | In the Grace Hopper whitepaper, NVIDIA says that they
             | connect multiple nodes with a fabric that allows them to
             | creat a virtual address space across all of them.
        
         | dylan604 wrote:
         | >(comedy option: 32-bit int, 128-bit long, and 64-bit `unsigned
         | middle`)
         | 
         | rather than unsigned middle, could we just call it malcom?
        
         | cmrdporcupine wrote:
         | Thanks for pointing out the `usize` ambiguity. It drives me
         | nuts. I suspect it would make me even crazier if I was doing
         | embedded with Rust right now and had to work with unsafe
         | hardware pointers.
         | 
         | (They also fail to distinguish an equivalent of `off_t` out,
         | too. Not that I think that would have the same bit width
         | ambiguities. But it seems odd to refer to offsets by a 'size')
        
         | api wrote:
         | I can't imagine a single Linux kernel instance or single
         | program controlling that much bus-local RAM, but as you say
         | there are other uses.
         | 
         | One use I can imagine is massively distributed computing where
         | pointers can refer to things that are either local or remote.
         | These could even map onto IPv6 addresses where the least
         | significant 64 bits are a local machine pointer and the most
         | significant 64 bits are the machine's /64. Of course the
         | security aspect would have to be handled at the transport layer
         | or this would have to be done on a private network. The latter
         | would be more common since this would probably be a
         | supercomputer thing.
         | 
         | Still... I wonder if this needs CPU support or just compiler
         | support? Would you get that much more performance from having
         | this in hardware?
         | 
         | I do really like how Rust has u128 native in the language. This
         | permits a lot of nice things including efficient implementation
         | of some cryptography and math stuff. C has irregular support
         | for uint128_t but it's not really a first class citizen.
        
           | gjvc wrote:
           | _I can 't imagine a single Linux kernel instance or single
           | program controlling that much bus-local RAM, but as you say
           | there are other uses._
           | 
           | MS-DOS and 640K ...
        
             | api wrote:
             | The addressable size grows exponentially with more bits,
             | not linearly. 2^64 is not twice as big as 2^32. It's more
             | than four billion times as big. 2^32 was only 65536 times
             | as big as 2^16.
             | 
             | Going past 2^64 bytes of local high speed RAM becomes a
             | physics problem. I won't say never but it would not just be
             | an evolutionary change from what we have and a processor
             | that could perform useful computations on that much data
             | would be equally nuts. Just moving that much data on a bus
             | of today would take too long to be useful, let alone
             | computing on it.
        
         | retrac wrote:
         | > 64 bits provides 16 EiB (16 x 1024 x 1024 x 1024 x 1 GiB),
         | which is the sort of address space you might need for byte-
         | level addressing of a warehouse full of high-density HDDs.
         | Memory sizes don't grow like they used to
         | 
         | An exabyte was an absolutely incomprehensible amount of memory,
         | once. Nearly as incomprehensible as 4 gigabytes seemed, at one
         | time. But as you note, 64 bits of addressable data can fit into
         | a single warehouse now.
         | 
         | Going by the historical rate of increase, $100 would buy about
         | a petabyte of storage in 2040. Even presuming a major slowdown,
         | we still start running into 64 bit addressing as a practical
         | limit, perhaps sooner than you think.
        
         | Spooky23 wrote:
         | Storage and interconnect specs are getting a lot faster. I
         | could see a world where you treated an S3 scale storage system
         | as a giant tiered addressable memory space. AS/400 systems sort
         | of did something like that at a small scale.
        
         | eterevsky wrote:
         | 8 EiB of data (in case we want addresses using signed integers)
         | is around 20 metric tons of micro-SD cards one TB each
         | (assuming they weigh 2g each). This could probably fit in a
         | single shipping container.
        
           | hnuser123456 wrote:
           | shipping container = 40x8x8ft
           | 
           | microsd card = 15x11x1mm, 0.5g
           | 
           | fits 437,503,976 cards = 379 EiB, costs $43.7B
           | 
           | 219 metric tons
           | 
           | 8 EiB ~ 10,000,000 TB = fills the shipping container 2.2%
           | high or 56mm or 2 inches, 5 metric tons, costs $1B
           | 
           | shipping containers are rated for up to 24 metric tons, so
           | ~40 EiB $5B 10 inches of cards etc
        
             | bombcar wrote:
             | Don't underestimate the bandwidth of a shipping container
             | filled with SD cards.
        
           | MR4D wrote:
           | But the cooling....
           | 
           | Seriously, that was great math you did there, and a neat way
           | to think about volume. That's a standard shipping container
           | [0], which is less than I thought it would be.
           | 
           | [0] -
           | https://www.mobilemodularcontainers.com/products/storage-
           | con...
        
         | tcoppi wrote:
         | Even assuming you are correct on all these points ASLR is still
         | an important use case and the effective security of current
         | 64-bit address spaces is low.
        
         | anotherhue wrote:
         | FYI Such memory tagging has a rich history
         | https://en.wikipedia.org/wiki/Tagged_architecture
        
         | torginus wrote:
         | And there's another disadvantage to 128-bit pointers - memory
         | size and alignment. It would follow that each struct field
         | would become 16 byte-aligned, and pointers would bloat up as
         | well, leading to even more memory consumption, especially in
         | languages that favor pointer-heavy structures.
         | 
         | This was a major counterargument against 64-bit x86, where the
         | transition came out as a net zero in terms of performance, due
         | to the hit of larger pointer sizes counterbalanced by ISA
         | improvements such as more addressable registers.
         | 
         | Many people in high-performance circles advocate using 32-bit
         | array indices opposed to pointers, to counteract the cache
         | pollution effects.
        
           | akira2501 wrote:
           | I figure the cache is going to be your largest disadvantage,
           | and is the primary reason CPUs don't physically implement all
           | address bits and why canonical addressing was required to get
           | all this off the ground in the first place.
        
         | mhh__ wrote:
         | Systems for which pointers are not just integers have come and
         | gone, sadly.
         | 
         | Many mainframes had function pointers which were more like a
         | struct than a pointer.
        
           | apaprocki wrote:
           | Itanium function pointers worked like that, so it could have
           | been the new normal if IA64 wasn't so crazy on the whole.
        
             | xxpor wrote:
             | A Raymond Chen post about the function pointer craziness: h
             | ttps://devblogs.microsoft.com/oldnewthing/20150731-00/?p=90
             | ...
        
           | masklinn wrote:
           | Technically that's still pretty common on the software side,
           | that's what tagged pointers are.
           | 
           | The ObjC/Swift runtime uses that for instance, the class
           | pointer of an object also contains the refcount and a few
           | flags.
        
             | mhh__ wrote:
             | Pointer tagging is still spiritually an integer (i.e. mask
             | off the bits you guarantee the tags are in via alignment)
             | versus (say) the pointer being multiple addresses into
             | different bits of different types of memory _and_ tags
             | insert other stuff here.
        
               | masklinn wrote:
               | > Pointer tagging is still spiritually an integer
               | 
               | No, pointer tagging is spiritually a packed structure.
               | That can be a simple union (as it is in ocaml IIRC) but
               | it can be a lot more e.g. the objc "non-pointer isa"
               | ended up with 5 flags (excluding raw isa discriminant)
               | and two additional non-pointer members of 19 and 9 bits.
               | 
               | You mask things on and off to unpack the structure into
               | component the system will accept. Nothing precludes using
               | tagged pointers to discriminate between kinds and
               | locations.
        
         | TheCondor wrote:
         | It's very difficult to see normal computers that normal people
         | use needing it any time soon, I agree. Frontier has 9.2PB of
         | memory though, so that's 50bits for a petabyte and then 4 more
         | bits, 54bits of memory addressability if we wanted to byte
         | address it all. Looking at it that way, if super computers
         | continue to be funded and grow like they have, we're getting
         | shockingly close to 64bits of addressable memory.
         | 
         | I don't know that that really means we need 128bit, 80 or
         | 96bits buys a lot of time, but it's probably worth a little bit
         | of thought.
         | 
         | I don't know how many of you remember the pre-386 days. It was
         | an effort to write interesting programs though, 512KB or 640KB
         | of memory to work with but it was 16bit addressable and so
         | you're writing code to manage segments and stuff, it's an extra
         | degree of complexity and a pain to debug. 32bits seemed like a
         | godsend when it happened. I imagine most of the dorks on here
         | have ripped a blu-ray or transcoded a video image from
         | somewhere, it's not super unusual to be dealing with a single
         | file that cannot be represented as bytes with a 32bit pointer.
         | 
         | It's all about cost and value, 64bits is still a staggering
         | amount of memory but if the protein folding problems and
         | climate models and what have you need 80bits of memory to
         | represent the problem space, I would hope that the people
         | building those don't also have to worry about the memory "shoe
         | boxing" problems of yesteryear too.
        
         | aidenn0 wrote:
         | > ...which is the sort of address space you might need for
         | byte-level addressing of a warehouse full of high-density HDDs
         | 
         | So if you want to mmap() files stored in your datacenter
         | warehouse, maybe you do need it?
        
         | lostmsu wrote:
         | So 64 bit address is only 1024 16TB HDDs? That number may go
         | down quickly. There is a 100TB SSD already.
        
           | Beltiras wrote:
           | As my siblings are pointing out your error they are not
           | addressing what you are saying. You are absolutely correct.
           | It's conceivable within a few years to fit this much storage
           | in one device (might be a full rack full of disks but still).
        
           | ijlx wrote:
           | 1024*1024 16TB HDDs, or 1,048,576.
        
           | oofbey wrote:
           | 1,000,000 drives at 16 TB each I think.
           | 
           | Kilo is 10 bits. Mega 20. Gigs 30. Tera 40. 16TB is 44 bits.
           | 1000* is another 10 bits so 54.
        
           | infinityio wrote:
           | Not quite - 1024GiB is 1TiB, so it's 1024 x 1024 x 16TiB
           | drives
        
           | lostmsu wrote:
           | Ah, parent already corrected their mistake. The comment I was
           | responding to was saying 16*1024*1024*1GB.
        
         | mort96 wrote:
         | We couldn't introduce a new 'middle' keyword, but could we say
         | 'int' is 32 bit, 'long' is 128 bit and 'short long' is 64
         | bit..?
        
       | MikeHalcrow wrote:
       | I recall sitting in a packed room with over a hundred devs at the
       | 2004 Ottawa Linux Symposium while the topic of the number of
       | filesystem bits was being discussed (link:
       | https://www.linux.com/news/ottawa-linux-symposium-day-2/). I
       | recall people throwing out questions as to why we weren't just
       | jumping to 128 or 256 bits, and at one point someone blurted out
       | something about 1024 bits. Someone then made a comment about the
       | number of atoms in the universe, everyone chuckled, and the
       | discussion moved on. I sensed the feeling in the room was that
       | any talk of 128 bits or more was simply ridiculous. Mind you this
       | was for storage.
       | 
       | Fast-forward 18 years, and it's fascinating to me to see people
       | now seriously floating the proposal to support 256-bit pointers.
        
       | Blikkentrekker wrote:
       | > _How would this look in the kernel? Wilcox had originally
       | thought that, on a 128-bit system, an int should be 32 bits, long
       | would be 64 bits, and both long long and pointer types would be
       | 128 bits. But that runs afoul of deeply rooted assumptions in the
       | kernel that long has the same size as the CPU 's registers, and
       | that long can also hold a pointer value. The conclusion is that
       | long must be a 128-bit type._
       | 
       | Can anyone explain the rationale for not simply naming types
       | after their size? In many programming languages, rather than this
       | arcane terminology, "i16", "i32", "i64", and "i128" simpy exist.
        
         | creativemonkeys wrote:
         | I'm sure someone will come along and explain why I have no idea
         | what I'm talking about, but so far my understanding is those
         | names exist because of the difference in CPU word size.
         | Typically "int" represents the natural word size for that CPU,
         | which matches the register size as well, so 'int plus int' is
         | as fast as addition can run by default, on a variety of CPUs.
         | That's one reason chars and shorts are promoted to ints
         | automatically in C.
         | 
         | Let's say you want to work with numbers and you want your
         | program to run as fast as possible. If you specify the number
         | of bits you want, like i32, then the compiler must make sure on
         | 64bit CPUs, where the register holding this value has an extra
         | 32bits available, that the extra bits are not garbage and
         | cannot influence a subsequent operation (like signed right
         | shift), so the compiler might be forced to insert an
         | instruction to clear the upper 32bits, and you end up with 2
         | instructions for a single operation, meaning that your code now
         | runs slower on that machine.
         | 
         | However, had you used 'int' in your code, the compiler would
         | have chosen to represent those values with a 64bit data type on
         | 64bit machines, and 32bit data type on 32bit machines, and your
         | code would run optimally, regardless of the CPU. This of course
         | means it's up to you to make sure that whatever values your
         | program handles fit in 32bit data types, and sometimes that's
         | difficult to guarantee.
         | 
         | If you decide to have your cake and eat it too by saying "fine,
         | I'll just select i32 or i64 at compile time with a condition"
         | and you add some alias, like "word" -> either i32 or i64, "half
         | word" -> either i16 or i32, etc depending on the target CPU,
         | then congrats, you've just reinvented 'int', 'short', 'long',
         | et.al.
         | 
         | Personally, I'm finding it useful to use fixed integer sizes
         | (e.g. int32_t) when writing and reading binary files, to be
         | able to know how many bytes of data to read when loading the
         | file, but once those values are read, I cast them to (int) so
         | that the rest of the program can use the values optimally
         | regardless of the CPU the program is running on.
        
           | nicoburns wrote:
           | That explains "int", but it doesn't explain short or long or
           | long long. Rust has "usize" for the "int" case, and then
           | fixed sizes for everything else, which works much better. If
           | you want portable software, it's usually more important to
           | know how many bits you have available for your calculation
           | than it is to know how efficiently that calculation will
           | happen.
        
         | masklinn wrote:
         | Legacy, there's lots of dumb stuff in C. As you note, in modern
         | languages the rule is generally to have fixed-size integers.
         | 
         | Though I think there are portability issues concerns, that
         | world is mostly gone (it remains in some corners of computing
         | e.g. dsps) but if you're only using fixed-size integers what do
         | you do when a platform doesn't have that size? With a more
         | flexible scheme, you have less issues there, however as the
         | weirdness landscape contracts the risk of making technically
         | incorrect assumptions (about relations between type sizes, or
         | the actual limits and behaviour of a given type) start
         | increasing dramatically.
         | 
         | Finally there's the issue at hand here: even with fixed-size
         | integers, "pointer" is a variable-size datum. So you still need
         | a variable-size integer to go with it. C historically lacking
         | that (nowadays it's called uintptr_t), the kernel made
         | assumptions which are incorrect.
         | 
         | Note that you can still get it wrong even if you try e.g. Rust
         | believes and generally assumes that usize and pointers
         | correspond, but that gets iffy with concepts like pointer
         | provenance, which decouple pointer size and address space.
        
           | mpweiher wrote:
           | > in modern languages the rule is generally to have fixed-
           | size integers.
           | 
           |  _Modern_ languages have unlimited size integers :-)
           | 
           | "Modern" as in "since at least the 80s, more likely 70s".
        
             | fluoridation wrote:
             | Good luck using those to specify the data layout of a
             | network packet.
        
               | mpweiher wrote:
               | Well, for that you'd probably use a specialisation of
               | Integer that's bounded and can thus be represented in a
               | machine word.
        
               | fluoridation wrote:
               | And then you'll be wasting time marshaling data between
               | the stream and your objects because they're not PODs and
               | so you can't just memcpy() onto them.
        
               | josefx wrote:
               | You are supposed to stream your video data as base64
               | encoded xml embedded in a json array.
        
             | kuratkull wrote:
             | Good luck seeing your performance drop off a very sharp
             | cliff if you start using larger numbers than your CPU can
             | fit into a single register.
        
               | mpweiher wrote:
               | Well, in those case other languages fail.
               | 
               | Either silently with overflows, usually leading to
               | security exploits, or by crashing.
               | 
               | So in either case you are betting that these cases are
               | somewhere between rare and non-existent, particularly for
               | your core/performance intensive code.
               | 
               | Being somewhat slower, probably in very isolated contexts
               | (60-62 bits is quite a bit to overflow), but always
               | correct seems like the better tradeoff.
               | 
               | YMMV. -\\_(tsu)_/-
        
           | raverbashing wrote:
           | > Legacy, there's lots of dumb stuff in C.
           | 
           | Yes, this, so much this
           | 
           | Who cares what an 'int' or a 'long' is. Except for things
           | like the size of a pointer, it's better if you know exactly
           | what you're working with.
        
         | PaulHoule wrote:
         | C dates back to a time when the 8 bit byte didn't have 100%
         | market share.
        
           | yetihehe wrote:
           | Plus, it was a language to write systems, where "size of
           | register on current machine" was a nice shortcut for "int",
           | where registers could be anywhere from 8-32 bits, with 48 or
           | 12 also a possibility.
        
             | masklinn wrote:
             | Except that's not been true in a while, and technically
             | this assumptions was not kosher for even longer: C itself
             | only guarantees that int is 16 bits.
        
             | cestith wrote:
             | I have a couple of 12-bit machines upstairs. There were
             | also 36-bit systems once upon a time.
        
           | mhh__ wrote:
           | C still (I think, C23 may have finally killed support*)
           | supports architectures like clearpath mainframes which have a
           | 36 bit word, 9 bit byte, 36 (IIRC) bit data pointer and a 81
           | bit function pointer.
           | 
           | The changes to the arithmetic rules mean you can't have sign-
           | magnitude or 1s complement anymore IIRC
        
         | wongarsu wrote:
         | That's pretty much the mentioned proposal of "just use rust
         | types", which are i16/u16 to i128/u128, plus usize/isize for
         | pointer-sized things.
         | 
         | The only improvement that you really need over that is to
         | differentiate between what c calls size_t and uintptr_t: the
         | size of the largest possible array, and the size of a pointer.
         | On "normal" architectures they're the same, but on
         | architectures that do pointer tagging or segmented memory a
         | pointer might be bigger than the biggest possible array.
         | 
         | But you still have to deal with legacy C code, and C was dreamt
         | up when running code written for 16 bits on a 14 bit
         | architecture without losing speed was a consideration, so the C
         | type's are weird.
        
           | thrown_22 wrote:
           | stdint.h has been around far longer than Rust.
           | 
           | I've been using those since the 00s for bit banging code
           | where I need guarantees for where each bit goes.
           | 
           | Nothing quite like working with a micro processor with 12bit
           | words to make you appreciate 2^n addresses.
        
         | quonn wrote:
         | I think that's because of portability. So that the common types
         | just map to the correct size on a given system.
        
         | Stamp01 wrote:
         | C99 specifies stdint.h/inttypes.h as part of the standard
         | library for exactly this purpose. I'd expect using it would be
         | a best practice at this point. But I'm no C expert, so maybe
         | there's a good reason for not always using those explicitly
         | sized types.
        
         | pjmlp wrote:
         | Windows has macros for that kind of stuff, and only in C99 the
         | stdint header came to be.
         | 
         | So you had almost three decades with everyone coming up with
         | their own solution.
         | 
         | To be fair, the other languages were hardly any better than C
         | in this regard.
        
         | m0RRSIYB0Zq8MgL wrote:
         | That is what was suggested in the next paragraph.
         | 
         | > But a better solution might just be to switch to Rust types,
         | where i32 is a 32-bit, signed integer, while u128 would be
         | unsigned and 128 bits. This convention is close to what the
         | kernel uses already internally, though a switch from "s" to "i"
         | for signed types would be necessary. Rust has all the types we
         | need, he said, it would be best to just switch to them.
        
         | [deleted]
        
       | PaulDavisThe1st wrote:
       | I remember a quote from the papers about Opal, an experimental OS
       | that was intended to use h/w protection rather than virtual
       | memory, so that all processes share the same address space and
       | can just exchange pointers to share data.
       | 
       | "A 64 bit memory space is large enough that if a process
       | allocated 1MB every second, it could continue doing this until
       | significantly past the expected lifetime of the sun before it ran
       | into problems"
        
       | torginus wrote:
       | On a bit tangential note, RAM price for a given cost used to
       | increase exponentially until the 2010s or so.
       | 
       | Since then, it only roughly halved. What happened?
       | 
       | https://jcmit.net/memoryprice.htm
       | 
       | I know it's not process geometry, since we went from 45nm->5nm in
       | the time, a roughly 81x decrease.
       | 
       | Is is realistic to assume scaling will resume?
        
         | hnuser123456 wrote:
         | We decided to slow down giving programmers excuses to make chat
         | applications as heavy as web browsers
        
       | jaimehrubiks wrote:
        
         | dredmorbius wrote:
         | _As long as the posting of subscriber links in places like this
         | is occasional, I believe it serves as good marketing for LWN -
         | indeed, every now and then, I even do it myself. We just hope
         | that people realize that we run nine feature articles every
         | week, all of which are instantly accessible to LWN
         | subscribers._
         | 
         | -- Jonathan Corbet, LWN founder & and grumpy editor in chief
         | 
         | <https://news.ycombinator.com/item?id=1966033>
         | 
         | Multiple other approvals: <https://hn.algolia.com/?dateRange=al
         | l&page=0&prefix=false&qu...>
         | 
         | Jon's own submissions:
         | <https://news.ycombinator.com/submitted?id=corbet>
         | 
         | And if we look for SubscriberLink submissions with significant
         | (>20 comments) discussion ... they're showing up every few
         | weeks, largely as Jon had requested.
         | 
         | <https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que..
         | .>
         | 
         | That said, those who are able to comfortably subscribe and find
         | this information useful: please _do_ support the site through
         | subscriptions.
        
           | nequo wrote:
           | Whether the posting of subscriber links is "occasional" as of
           | late is debatable.[1] Most of LWN's paywalled content is
           | posted on HN.
           | 
           | [1] https://news.ycombinator.com/item?id=32926853
        
             | dredmorbius wrote:
             | jaimehrubiks stated unequivocally without substantiation
             | that "Somebody asked before to please not share lwn's
             | SubscriberLinks". LWN's founder & editor has repeatedly
             | stated otherwise, _hasn 't_ criticised the practice, and
             | participates in the practice himself, as recently as three
             | months ago.
             | 
             | SubscriberLinks are tracked by the LWN account sharing
             | them. Abuse can be managed through LWN directly should that
             | become an issue. Whether or not that's occurred in the past
             | I've no idea, but the capability still exists and is
             | permitted.
             | 
             | No link substantiating jamiehrubiks' assertion seems to
             | have been supplied yet.
             | 
             | I'm going to take Corbet's authority on this.
        
               | nequo wrote:
               | Corbet repeatedly used the word "occasionally," sometimes
               | even with emphasis.
               | 
               | What I'm saying is that the current situation is that
               | most of the for-pay content of LWN is available on HN
               | which is at odds either with his wish that it be
               | occasional or with my understanding of English.
        
               | dredmorbius wrote:
               | Most of those submissions die in the queue.
               | 
               | I'd set a 20-comment limit to the search I presented for
               | a reason. At present, the 30 results shown go back over 7
               | months. That's roughly a significant submission per week.
               | 
               | Contrasting a search for "lwn.net" alone in submissions,
               | the first page of results (sorted by date, again, 30
               | results) only goes back 3 weeks (22 days). But most of
               | those get little activity --- some upvotes, and a few
               | with many comments, but, in a third search sorted by
               | popularity over the past month,
               | 
               | <https://hn.algolia.com/?dateRange=pastMonth&page=0&prefi
               | x=tr...>
               | 
               | Ten of those meet or beat my 20-comment threshold, 20
               | don't. And note that 20 comments isn't especially
               | significant, 4 submissions exceed 100 comments.
               | 
               | lwn SubscriberLink & > 20 comments, by date: <https://hn.
               | algolia.com/?dateRange=all&page=0&prefix=true&que...>
               | 
               | All "lwn.net" for past month: <https://hn.algolia.com/?da
               | teRange=pastMonth&page=0&prefix=tr...>
               | 
               | Data:
               | 
               | Comments: 189 94 155 254 153 29 46 10 14 20 13 12 89 10
               | 21 1 8 0 1 1 0 2 0 0 0 0 2 1 1 0
               | 
               | Points: 306 271 254 240 166 114 109 89 62 58 53 45 42 39
               | 37 30 20 7 5 5 5 4 4 4 4 4 3 3 3 3
               | 
               | I'm not saying that the concern doesn't exist. But
               | ultimately, it's LWN's to address. The constant
               | admonishments to _not_ share links seem to fall into
               | tangential annoyances and generic tangents, both against
               | HN guidelines:
               | <https://news.ycombinator.com/newsguidelines.html>
               | 
               | I'd suggest leaving this to Corbet and dang.
        
       | munro wrote:
       | There was a post awhile back from NASA saying how many digits of
       | Pi they actually need [1].                   import math
       | pi = 314159265358979323846264338327950288419716939937510582097494
       | 45923078164062862089986280348253421170679821480865132823066470938
       | 44609550582231725359408128481117450284102701938521105559644622948
       | 95493038196442881097566593344612847564823378678316527120190914564
       | 85669234603486104543266482133936072602491412737245870066063155881
       | 74881520920962829254091715364367892590360              sign_bits
       | = 1         sig_bits = math.ceil(math.log2(pi))         exp_bits
       | = math.floor(math.log2(sig_bits))              assert sign_bits +
       | sig_bits + exp_bits == 1209
       | 
       | I'm sure I got something wrong here, def off-by-one, but roughly
       | it looks like it would need 1209-bit floats (2048-bit rounded
       | up!). IDK, mildly interesting. :>
       | 
       | [1] https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-
       | decimal...
        
         | PaulDavisThe1st wrote:
         | The size of required data types is mostly orthogonal to the
         | size of memory addresses or filesystem offsets.
        
         | jabl wrote:
         | Pi is a bit special because in order to get accurate argument
         | reduction for trigonometric functions you needs lots of digits
         | (IIRC ~1000 for double precision).
         | 
         | E.g. https://redirect.cs.umbc.edu/~phatak/645/supl/Ng-
         | ArgReductio...
        
         | vanderZwan wrote:
         | The value of Pi you mention was the one in the question, the
         | one in the answer is:
         | 
         | > _For JPL 's highest accuracy calculations, which are for
         | interplanetary navigation, we use 3.141592653589793. Let's look
         | at this a little more closely to understand why we don't use
         | more decimal places. I think we can even see that there are no
         | physically realistic calculations scientists ever perform for
         | which it is necessary to include nearly as many decimal points
         | as you present._
         | 
         | That's sixteen digits, so a quick trip to the dev tools tels
         | me::                   >> Math.log2(3141592653589793)
         | -> 51.480417552782754
         | 
         | The last statement of the text I quoted is more interesting
         | though. Although not surprising to me, given how many
         | astronomers I know who joke that Pi equals three all the time.
        
           | munro wrote:
           | Lol I should RTFA ;D
        
             | vanderZwan wrote:
             | Nah, just claim you were invoking Cunningham's Law ;)
        
           | Beltiras wrote:
           | I have horrid memories of debugging I had to do to get some
           | god-awful fourier transform to calculate with 15 digits of
           | precision to fit a spec. It's right at the boundary where
           | double-precision stops being deterministic. Worst debugging
           | week of my life.
        
             | vanderZwan wrote:
             | > _stops being deterministic_
             | 
             | I'm imagining the maths equivalent of Heisenbugs, is that
             | correct?
        
               | Beltiras wrote:
               | No, just having to match how Matlab did the calculation
               | (development of an index) to implementing the _same
               | thing_ in C++ (necessitating showing the calculation
               | returned same significant digits for the precision we
               | expected). I 've seen a Heisenbug and that was really
               | weird. Happened during uni so I didn't have to start
               | tracing down compiler bugs. Not even sure if I could,
               | happened with Java.
        
       | rmorey wrote:
       | this seems just a bit too early - so that probably means it's
       | exactly the right time!
        
         | marktangotango wrote:
         | I was wondering if this (128bit memory) are on the radar of any
         | of the BSDs. Will they forever be stuck at 64bit?
        
           | fanf2 wrote:
           | CheriBSD might be the first unix-like with 128 bit pointers
        
       | amelius wrote:
       | Perhaps it's an idea to make Linux parameterized in the
       | pointer/word size, and let the compiler figure it out in the
       | future.
        
       | bitwize wrote:
       | Pointers will get fat (CHERI and other tagged pointer schemes)
       | well before even server users will need to byte-address into more
       | than 2^64 bytes' worth of stuff. So we should probably be
       | realistically aiming for _256-bit_ architectures...
        
       | t-3 wrote:
       | Are there operations that vector processors are inherently worse
       | at or much harder to program for? Nowadays they seem to be mainly
       | used for specialized tasks like graphics and machine learning
       | accelerators, but given the expansion of SIMD instruction sets,
       | are general purpose vector CPUs in the pipeline anywhere?
        
       ___________________________________________________________________
       (page generated 2022-09-23 23:00 UTC)