[HN Gopher] Where the top of the stack is on x86 (2011)
       ___________________________________________________________________
        
       Where the top of the stack is on x86 (2011)
        
       Author : cassepipe
       Score  : 73 points
       Date   : 2021-05-07 15:51 UTC (7 hours ago)
        
 (HTM) web link (eli.thegreenplace.net)
 (TXT) w3m dump (eli.thegreenplace.net)
        
       | rdhatt wrote:
       | Remember:
       | 
       | "The x86 architecture is the weirdo" --Raymond Chen
       | 
       | https://devblogs.microsoft.com/oldnewthing/20040914-00/?p=37...
       | 
       | https://devblogs.microsoft.com/oldnewthing/20130320-00/?p=48...
        
         | Narishma wrote:
         | In this case, it's not a weirdo, is it? I don't know of any
         | popular ISA with a stack that grows upward.
        
       | colejohnson66 wrote:
       | It also doesn't help that debuggers show the call stack with the
       | current function at the top and the entry point at the bottom
       | (the opposite of how it works in memory).
        
       | TrianguloY wrote:
       | It's easier to understand if your diagram show the starting
       | memory positions (address 0x00000000) at the top, and final
       | positions (address 0xFFFFFFFF) at the bottom. This way the top of
       | the stack is, precisely, at the top.
       | 
       | It doesn't seem to be an standard and there are almost the same
       | number of diagrams with memory from 0 to F than from F to 0. Some
       | of them are more useful in specific contexts, but most are simply
       | drawn the way the author is used to.
       | 
       | Personally I prefer increasing numbers go down, like lines in a
       | file, numbers on a numbered list, timestamp on a log, etc. Having
       | the highest memory value 0xFF on the top seems...odd.
        
         | ellis-bell wrote:
         | but then the heap would grow down in such a diagram...
         | 
         | btw my understanding is that the heap came first in the logical
         | development of C and other systems programming. so it made
         | sense to have .text and other program data at the lowest
         | virtual memory addresses and then have the heap grow towards
         | higher memory addresses.
         | 
         | then when the stack became a thing it had to grow down...
        
           | monocasa wrote:
           | The stack might have actually come before the heap in
           | history, and actually proceeds any systems programming
           | language that was a higher level than assembly. They were
           | seen in the Z4, and Turing wrote about them in 40s.
        
         | anyfoo wrote:
         | We call things "base of memory" and "top of memory" though,
         | with the latter one being at the highest address of memory.
        
         | efaref wrote:
         | I never understood why people ever put memory upside down on
         | diagrams.
         | 
         | In fact, I think the only time I see diagrams with F at the top
         | and 0 at the bottom is in explanations of the stack, or in
         | explanations of how "the stack is so weird it grows the wrong
         | way!".
         | 
         | When describing, e.g., a memory map for a custom ASIC, you
         | would _always_ put the low addresses first and the high
         | addresses later. That 's just how numbers work. The "stack
         | grows the wrong way" issue seems to be an invented problem.
        
           | toast0 wrote:
           | If you're writing a diagram for little endian, you end up
           | with the bytes going right to left, if you have multibyte
           | values, so you may as well read bottom to top, right to left.
        
         | billforsternz wrote:
         | Exactly. After all we write programs by starting at the top of
         | the screen and working our way down, and happily of course
         | program execution increments in the same direction. So small
         | addresses at the top and big addresses at the bottom is the
         | most natural way of presenting any memory map.
        
         | albntomat0 wrote:
         | That way also makes reading/writing a buffer go from top to
         | bottom, similar to reading/writing normally.
        
         | moonchild wrote:
         | From a footnote of TFA:
         | 
         | > You may try to fix the confusion by viewing memory with its
         | low addresses at the top and high addresses at the bottom.
         | While this would indeed make stack movement more natural, it
         | would also mean that increasing some memory address would take
         | it _down_ in the graphical representation, which is probably
         | even more counter-intuitive.
         | 
         | Ultimately, my approach is to avoid directional terms such as
         | 'top' and 'bottom' or 'high' and 'low' in the first place; they
         | only cause confusion. Prefer 'greater' or 'smaller' addresses.
         | 
         | (Similarly 'left' and 'right' when applied to bits, which gets
         | especially confusing if endianness is involved; prefer 'more
         | significant' and 'less significant'.)
        
         | roelschroeven wrote:
         | > Having the highest memory value 0xFF on the top seems...odd
         | 
         | Dunno ... the highest being on top feels very normal ... .
         | Isn't that more or less the definition of "highest" and "top"?
         | 
         | We have conflicting conventions in all kinds situations:
         | 
         | Numbers increase towards the top in a class Cartesian
         | coordinate system, in the numbering of floors in buildings, on
         | calculator keypads, ...
         | 
         | Numbers increase towards the bottom in coordinate systems in
         | many computer graphics contexts, in numbered lists, on
         | telephone keypads, ...
         | 
         | Sometimes these conventions meet and clash and there is no one
         | right way to handle that.
        
         | [deleted]
        
       | azhenley wrote:
       | The title should be: Where the top of the stack is on x86 (2011)
        
       | acchow wrote:
       | Just flip your diagram upside down and there's no confusion.
        
       | amelius wrote:
       | Shouldn't there be two stacks? One to contain stuff like return
       | addresses, and the other to contain data? And the stack
       | containing addresses in a separate address space that can only be
       | accessed through stack manipulation/return instructions.
        
         | moonchild wrote:
         | This has been suggested, including by the proposed mill and
         | forwardcom architectures. It's also the programming model used
         | by stack-based languages such as forth.
         | 
         | Another nice side effect aside from security is that it
         | simplifies the return predictor.
        
       | jonsen wrote:
       | And where does a branch go? It leaves the stem of consecutive
       | memory numbers into nowhere and mysteriously _comes back_ to the
       | stem at an arbitrary number.
        
       | glhaynes wrote:
       | Easy to get confused similarly about trees that grow down in
       | computer scientists' diagrams versus growing up in nature.
        
       | eatonphil wrote:
       | My confusion about statements like this are understanding where
       | the convention begins. Isn't it libc that sets up the stack?
       | Couldn't it decide to set up the stack starting from the bottom
       | of memory and put heap allocations at the top? I guess
       | instructions like PUSH/POP and derivatives wouldn't be useful
       | anymore so you'd have to recreate them. So I guess that means
       | that the convention starts at the processor? You could store
       | memory in the opposite way but it would just be slower since you
       | wouldn't be able to use built in operations. Do I have that
       | right?
        
         | ajross wrote:
         | > Isn't it libc that sets up the stack?
         | 
         | In coordination with the kernel, yes. Different OSes do this
         | differently, but many will set up the stack pointer for you.
         | Linux historically has not, the new thread keeps the parent
         | stack pointer and has to implement careful entry code to switch
         | it without messing up the parent thread.
         | 
         | > Couldn't it decide to set up the stack starting from the
         | bottom of memory and put heap allocations at the top?
         | 
         | In theory, but as you mention the architecture makes some clear
         | demands here. On x86 CALL/RET and PUSH/POP both require a
         | SP/ESP/RSP register with free space below it. Likewise x86
         | interrupts are handled on the stack and so you need to keep the
         | stack pointer initialized for them (modern CPUs will
         | automatically switch the stack for you from whatever user code
         | is running, but they still need to switch it to a grows-down
         | area you initialized for them).
         | 
         | Broadly, sure, you can come up with a software abstraction that
         | acted as a "stack", but it would have to be in addition to the
         | CPU stack you already need. In effect you'd have to burn a
         | register for this extra stack pointer, which has performance
         | implications.
        
           | toast0 wrote:
           | > Likewise x86 interrupts are handled on the stack and so you
           | need to keep the stack pointer initialized for them (modern
           | CPUs will automatically switch the stack for you from
           | whatever user code is running, but they still need to switch
           | it to a grows-down area you initialized for them).
           | 
           | If your kernel is re-entrant, you need to keep platform stack
           | conventions in the kernel, or an interrupt (or exception)
           | during the kernel will overwrite your backwards stack.
           | 
           | I don't know for sure, but I think aignal handing in user
           | processes runs on the user stack too (but I could easily be
           | wrong).
           | 
           | If you really wanted, you could use the platform stack for
           | call/ret only and have a separate data stack with whatever
           | conventions you like.
        
           | monocasa wrote:
           | > Linux historically has not, the new thread keeps the parent
           | stack pointer and has to implement careful entry code to
           | switch it without messing up the parent thread.
           | 
           | Well, for calls to fork(2) and clone(2), sure, but the kernel
           | will setup a stack for you on exec(2). It has to have
           | somewhere to stick the command line args, env, and some other
           | extra bits of information like a random seed.
        
           | eatonphil wrote:
           | That's helpful! Then why is it that libc is the one setting
           | up the stack if this is a convention basically required by
           | the processor?
        
             | ajross wrote:
             | Because the runtime linker (e.g. ld-linux.so) is
             | implemented in the C library, as is the user-callable
             | implementation of pthread_create() or whatever[1]. It's
             | definitely the application's job to decide on its own
             | memory layout in any case. The kernel doesn't tell you
             | where your stack should be, it's your address space.
             | 
             | [1] vs. the clone() Linux syscall, which isn't really
             | useful to regular code because of the complexity.
        
           | astrobe_ wrote:
           | Yes, you clearly see that when you implement a stack-based
           | virtual machine (as in bytecode interpreter, not VMWare and
           | the likes, although it is on the principle the same thing) :
           | your bytecode (or whatever technique you are using except
           | perhaps JIT) for push/pop/call/ret must agree on how to use
           | the stack.
           | 
           | It is particularly the case in a language like Forth, which
           | has two stacks that can be manipulated directly by the user
           | (actually the user _has_ to if they want to get anything
           | done...) : one for the parameters /arguments, and one for the
           | return addresses. This deviates from the usual single
           | hardware stack processors. Forth processors (there are still
           | some in operation) fully support those two stacks.
           | 
           | When you implement a Forth interpreter in assembler, you
           | generally use the hardware stack for either the parameters or
           | for the return addresses, while the other is managed manually
           | (usually using another addressing-capable register such as
           | EBP, ESI or EDI on x86).
           | 
           | If for instance you dedicate one segment (either in the x86
           | sense or in the common meaning), you typically use the push
           | down "hardware" stack that you initialize at the top of
           | memory, while the return addresses stack is a "push up"
           | software-managed stack and initialized at the bottom of the
           | memory.
        
         | ylyn wrote:
         | Yes, more or less. I'm not sure if push and pop is
         | significantly faster than a mov and add. But there are also
         | instructions like leave and ret that follow this same fully
         | descending stack convention. So if you deviate from that you
         | can't use any of those.
         | 
         | It's worth mentioning what fully descending means here. I'm
         | surprised the article didn't mention it.
         | 
         | "Fully" means that the stack pointer points to the last entry
         | on the stack (the top). "Empty" means it points to the next
         | entry just after the last (top) entry. That is, it points to
         | where the next push would be written to.
         | 
         | Anyway, you can have whatever calling convention you want,
         | really. All you need is a way to pass arguments, return values,
         | and to know where to return to. You could have a linked list of
         | frames allocated by malloc() with the head pointer in EBP, for
         | example. Say the return address and caller frame pointer is
         | stored in the start of the frame. Then you would return by mov
         | ebx, [ebp]; mov ebp, [ebp+4]; jmp ebx. Or something lile that.
         | This is a pretty ridiculous calling convention though, but it'd
         | work.
        
         | AshamedCaptain wrote:
         | In ARM you do have pop/push (and call/ret) instructions that
         | can go in either direction, and most platforms I know still
         | have a stack that grows to 0.
         | 
         | TBH, dunno where the difficulty is.
        
           | monocasa wrote:
           | Sort of. The ARM stuff was allowed other uses because they
           | were pretty generic load and store multiple instructions with
           | a lot of increment/decrement options to make up for the
           | original Acorn's lack of a DMA engine.
           | 
           | On anything resembling a recent ARM core though, using the
           | stack pointer register with anything other than a descending
           | stack has you falling off the perf wagon as it's backed by a
           | hardware stack engine.
        
       | JoeAltmaier wrote:
       | There is an argument for making the call/return stack a separate
       | non-addressable region from argument passing. So a malicious app
       | can't overwrite return addresses and execute arbitrary code.
       | Intel considered this early on, and rejected it.
       | 
       | Why? Because of Fortran. There are (were?) cases where a Fortran
       | app would reach back on the stack to retrieve values from before
       | the last call or some such. Rather than find some other way of
       | making that work for those folks, the separate-return-stack idea
       | was shelved.
       | 
       | And we all live with this debacle for decades.
        
         | stevemk14ebr wrote:
         | Checkout the shadow stack used for CET (control flow
         | enforcement technology). It is literally this idea
        
         | Someone wrote:
         | I don't understand that argument. All function arguments still
         | would be on a single stack (just as in Forth). That would only
         | be a valid argument if some FORTRAN code inspected the return
         | address, for example to behave differently, depending on the
         | caller.
         | 
         | I would guess it's more because early systems had small address
         | spaces. If your heap grows up from the bottom of memory, and
         | one stack grows down from top of your address space, how do you
         | know where to place that second stack, especially in a system
         | with, say, a 64kB address space?
        
           | JoeAltmaier wrote:
           | Yes, exactly. Some FORTRAN code did exactly that.
           | 
           | And the stack could go in a page not addressable by general
           | purpose instructions, save call and return instructions (and
           | perhaps some kernel debug).
           | 
           | Some limited-RAM embedded devices have dedicated stack RAM.
           | But my point was about Intel x86
        
         | monocasa wrote:
         | If that was an issue, they managed to fix it decades ago.
         | Itanium had separate return address and data stacks, and
         | scientific computing was one of it's few competitive
         | strongsuits.
        
         | titzer wrote:
         | You cannot address the call stack in WebAssembly, thus there
         | can be no stack-smashing attacks (that overwrite return
         | addresses). For languages that pass pointers into the stack,
         | they must use a "shadow" stack allocated as a separate region
         | and managed with an explicit stack pointer.
        
       | Denvercoder9 wrote:
       | Is there a reason why a stack growing towards the beginning of
       | memory is better than one growing towards the end of memory, or
       | is that just an arbitrary choice the x86 designers made?
        
         | vlmutolo wrote:
         | I wonder if it's at all related to the reasoning in this post
         | about writing a bump allocator.
         | 
         | https://fitzgeraldnick.com/2019/11/01/always-bump-downwards....
        
         | bonzini wrote:
         | It's mostly because it allows you to put code at a fixed
         | address at the beginning of the memory and the stack at the
         | opposite end. That lets the OS place the stack at a constant
         | address (for MS-DOS COM files the stack pointer starts at
         | 0xFFFE, with a zero already pushed so that a RET instruction
         | exits the program; a similar convention existed in CP/M).
        
         | pcwalton wrote:
         | I don't know if this is the real reason, but it's more natural
         | for stack-relative addressing to have the stack grow downwards,
         | because you can write e.g. [rsp+10] instead of [rsp-10].
         | 
         | Apparently on PA-RISC (hppa) the stack grows the other way, so
         | it is arbitrary.
        
           | tom_mellior wrote:
           | Why is rsp+10 more natural? If rsp is the top of the stack,
           | everything is below it, and "below" and "minus" go well
           | together.
           | 
           | Though I would agree that for the topmost value specifically,
           | rsp+0 feels more natural than rsp-4 or rsp-8.
        
       | xanathar wrote:
       | Bonus (?): the stack going down rather than up means that
       | overflowing a stack-allocated buffer will overwrite the contents
       | that are already on the stack (as they have a higher address than
       | the last item in the buffer), likely changing the return address
       | of the function and thus making arbitrary code execution a breeze
       | (see: https://en.wikipedia.org/wiki/Return-to-libc_attack).
       | 
       | Most operating systems and standard libraries have checks and
       | countermeasures to make this a lot harder nowadays, but still.
        
         | ajross wrote:
         | In fact the mitigations have been so effective that in practice
         | stack smash attacks are mostly historical at this point. But
         | yes: having the direction of "natural memory copies" be the
         | same direction as "back into the caller memory on the stack"
         | was clearly a really bad mistake in hindsight.
         | 
         | I actually don't know where it came from. It was true in
         | original PDP-11 Unix for sure: you had the program text
         | followed by a grows-up heap, with the grows-down stack placed
         | at the top of the program segment. Interestingly PDP-11
         | addressing was general enough to have implemented a grows-up
         | stack, so this is clearly a mistake Unix could have corrected.
         | I just don't know if this was the original use of the
         | convention or if it inherited it from elsewhere.
        
           | vlovich123 wrote:
           | Interestingly, per my reading of [1], these attacks are now
           | easily available again in WASM due to misplaced confidence in
           | the security of the language. It's a fantastic paper.
           | 
           | [1]
           | https://www.unibw.de/patch/papers/usenixsecurity20-wasm.pdf
        
           | xanathar wrote:
           | I _THINK_ that the reason is: 8086 inherited it from the 8085
           | which inherited it from the 8080. The next parent in line
           | would be the 8008, but that has a small call stack in the CPU
           | registers rather than RAM, so the ancestor would be the 8080.
           | 
           | The 8080 had 64KB address space. I bet the rationale was to
           | partition memory so that classic memory usage goes upwards
           | from 0000 to FFFF and the stack goes downwards from FFFF to
           | 0000. This removes the need to define a boundary between the
           | two beforehand.
           | 
           | Of course this is totally speculation on my part, I might be
           | super wrong.
        
             | bonzini wrote:
             | The 8008 instruction set was not designed by Intel IIRC, so
             | the lineage ends at the 8080.
             | 
             | Independently the 6502 also had a downward stack. I think
             | the only modern machine with am upward-growing stack is the
             | HP PA-RISC.
        
       ___________________________________________________________________
       (page generated 2021-05-07 23:00 UTC)