[HN Gopher] Where the top of the stack is on x86 (2011) ___________________________________________________________________ Where the top of the stack is on x86 (2011) Author : cassepipe Score : 73 points Date : 2021-05-07 15:51 UTC (7 hours ago) (HTM) web link (eli.thegreenplace.net) (TXT) w3m dump (eli.thegreenplace.net) | rdhatt wrote: | Remember: | | "The x86 architecture is the weirdo" --Raymond Chen | | https://devblogs.microsoft.com/oldnewthing/20040914-00/?p=37... | | https://devblogs.microsoft.com/oldnewthing/20130320-00/?p=48... | Narishma wrote: | In this case, it's not a weirdo, is it? I don't know of any | popular ISA with a stack that grows upward. | colejohnson66 wrote: | It also doesn't help that debuggers show the call stack with the | current function at the top and the entry point at the bottom | (the opposite of how it works in memory). | TrianguloY wrote: | It's easier to understand if your diagram show the starting | memory positions (address 0x00000000) at the top, and final | positions (address 0xFFFFFFFF) at the bottom. This way the top of | the stack is, precisely, at the top. | | It doesn't seem to be an standard and there are almost the same | number of diagrams with memory from 0 to F than from F to 0. Some | of them are more useful in specific contexts, but most are simply | drawn the way the author is used to. | | Personally I prefer increasing numbers go down, like lines in a | file, numbers on a numbered list, timestamp on a log, etc. Having | the highest memory value 0xFF on the top seems...odd. | ellis-bell wrote: | but then the heap would grow down in such a diagram... | | btw my understanding is that the heap came first in the logical | development of C and other systems programming. so it made | sense to have .text and other program data at the lowest | virtual memory addresses and then have the heap grow towards | higher memory addresses. | | then when the stack became a thing it had to grow down... | monocasa wrote: | The stack might have actually come before the heap in | history, and actually proceeds any systems programming | language that was a higher level than assembly. They were | seen in the Z4, and Turing wrote about them in 40s. | anyfoo wrote: | We call things "base of memory" and "top of memory" though, | with the latter one being at the highest address of memory. | efaref wrote: | I never understood why people ever put memory upside down on | diagrams. | | In fact, I think the only time I see diagrams with F at the top | and 0 at the bottom is in explanations of the stack, or in | explanations of how "the stack is so weird it grows the wrong | way!". | | When describing, e.g., a memory map for a custom ASIC, you | would _always_ put the low addresses first and the high | addresses later. That 's just how numbers work. The "stack | grows the wrong way" issue seems to be an invented problem. | toast0 wrote: | If you're writing a diagram for little endian, you end up | with the bytes going right to left, if you have multibyte | values, so you may as well read bottom to top, right to left. | billforsternz wrote: | Exactly. After all we write programs by starting at the top of | the screen and working our way down, and happily of course | program execution increments in the same direction. So small | addresses at the top and big addresses at the bottom is the | most natural way of presenting any memory map. | albntomat0 wrote: | That way also makes reading/writing a buffer go from top to | bottom, similar to reading/writing normally. | moonchild wrote: | From a footnote of TFA: | | > You may try to fix the confusion by viewing memory with its | low addresses at the top and high addresses at the bottom. | While this would indeed make stack movement more natural, it | would also mean that increasing some memory address would take | it _down_ in the graphical representation, which is probably | even more counter-intuitive. | | Ultimately, my approach is to avoid directional terms such as | 'top' and 'bottom' or 'high' and 'low' in the first place; they | only cause confusion. Prefer 'greater' or 'smaller' addresses. | | (Similarly 'left' and 'right' when applied to bits, which gets | especially confusing if endianness is involved; prefer 'more | significant' and 'less significant'.) | roelschroeven wrote: | > Having the highest memory value 0xFF on the top seems...odd | | Dunno ... the highest being on top feels very normal ... . | Isn't that more or less the definition of "highest" and "top"? | | We have conflicting conventions in all kinds situations: | | Numbers increase towards the top in a class Cartesian | coordinate system, in the numbering of floors in buildings, on | calculator keypads, ... | | Numbers increase towards the bottom in coordinate systems in | many computer graphics contexts, in numbered lists, on | telephone keypads, ... | | Sometimes these conventions meet and clash and there is no one | right way to handle that. | [deleted] | azhenley wrote: | The title should be: Where the top of the stack is on x86 (2011) | acchow wrote: | Just flip your diagram upside down and there's no confusion. | amelius wrote: | Shouldn't there be two stacks? One to contain stuff like return | addresses, and the other to contain data? And the stack | containing addresses in a separate address space that can only be | accessed through stack manipulation/return instructions. | moonchild wrote: | This has been suggested, including by the proposed mill and | forwardcom architectures. It's also the programming model used | by stack-based languages such as forth. | | Another nice side effect aside from security is that it | simplifies the return predictor. | jonsen wrote: | And where does a branch go? It leaves the stem of consecutive | memory numbers into nowhere and mysteriously _comes back_ to the | stem at an arbitrary number. | glhaynes wrote: | Easy to get confused similarly about trees that grow down in | computer scientists' diagrams versus growing up in nature. | eatonphil wrote: | My confusion about statements like this are understanding where | the convention begins. Isn't it libc that sets up the stack? | Couldn't it decide to set up the stack starting from the bottom | of memory and put heap allocations at the top? I guess | instructions like PUSH/POP and derivatives wouldn't be useful | anymore so you'd have to recreate them. So I guess that means | that the convention starts at the processor? You could store | memory in the opposite way but it would just be slower since you | wouldn't be able to use built in operations. Do I have that | right? | ajross wrote: | > Isn't it libc that sets up the stack? | | In coordination with the kernel, yes. Different OSes do this | differently, but many will set up the stack pointer for you. | Linux historically has not, the new thread keeps the parent | stack pointer and has to implement careful entry code to switch | it without messing up the parent thread. | | > Couldn't it decide to set up the stack starting from the | bottom of memory and put heap allocations at the top? | | In theory, but as you mention the architecture makes some clear | demands here. On x86 CALL/RET and PUSH/POP both require a | SP/ESP/RSP register with free space below it. Likewise x86 | interrupts are handled on the stack and so you need to keep the | stack pointer initialized for them (modern CPUs will | automatically switch the stack for you from whatever user code | is running, but they still need to switch it to a grows-down | area you initialized for them). | | Broadly, sure, you can come up with a software abstraction that | acted as a "stack", but it would have to be in addition to the | CPU stack you already need. In effect you'd have to burn a | register for this extra stack pointer, which has performance | implications. | toast0 wrote: | > Likewise x86 interrupts are handled on the stack and so you | need to keep the stack pointer initialized for them (modern | CPUs will automatically switch the stack for you from | whatever user code is running, but they still need to switch | it to a grows-down area you initialized for them). | | If your kernel is re-entrant, you need to keep platform stack | conventions in the kernel, or an interrupt (or exception) | during the kernel will overwrite your backwards stack. | | I don't know for sure, but I think aignal handing in user | processes runs on the user stack too (but I could easily be | wrong). | | If you really wanted, you could use the platform stack for | call/ret only and have a separate data stack with whatever | conventions you like. | monocasa wrote: | > Linux historically has not, the new thread keeps the parent | stack pointer and has to implement careful entry code to | switch it without messing up the parent thread. | | Well, for calls to fork(2) and clone(2), sure, but the kernel | will setup a stack for you on exec(2). It has to have | somewhere to stick the command line args, env, and some other | extra bits of information like a random seed. | eatonphil wrote: | That's helpful! Then why is it that libc is the one setting | up the stack if this is a convention basically required by | the processor? | ajross wrote: | Because the runtime linker (e.g. ld-linux.so) is | implemented in the C library, as is the user-callable | implementation of pthread_create() or whatever[1]. It's | definitely the application's job to decide on its own | memory layout in any case. The kernel doesn't tell you | where your stack should be, it's your address space. | | [1] vs. the clone() Linux syscall, which isn't really | useful to regular code because of the complexity. | astrobe_ wrote: | Yes, you clearly see that when you implement a stack-based | virtual machine (as in bytecode interpreter, not VMWare and | the likes, although it is on the principle the same thing) : | your bytecode (or whatever technique you are using except | perhaps JIT) for push/pop/call/ret must agree on how to use | the stack. | | It is particularly the case in a language like Forth, which | has two stacks that can be manipulated directly by the user | (actually the user _has_ to if they want to get anything | done...) : one for the parameters /arguments, and one for the | return addresses. This deviates from the usual single | hardware stack processors. Forth processors (there are still | some in operation) fully support those two stacks. | | When you implement a Forth interpreter in assembler, you | generally use the hardware stack for either the parameters or | for the return addresses, while the other is managed manually | (usually using another addressing-capable register such as | EBP, ESI or EDI on x86). | | If for instance you dedicate one segment (either in the x86 | sense or in the common meaning), you typically use the push | down "hardware" stack that you initialize at the top of | memory, while the return addresses stack is a "push up" | software-managed stack and initialized at the bottom of the | memory. | ylyn wrote: | Yes, more or less. I'm not sure if push and pop is | significantly faster than a mov and add. But there are also | instructions like leave and ret that follow this same fully | descending stack convention. So if you deviate from that you | can't use any of those. | | It's worth mentioning what fully descending means here. I'm | surprised the article didn't mention it. | | "Fully" means that the stack pointer points to the last entry | on the stack (the top). "Empty" means it points to the next | entry just after the last (top) entry. That is, it points to | where the next push would be written to. | | Anyway, you can have whatever calling convention you want, | really. All you need is a way to pass arguments, return values, | and to know where to return to. You could have a linked list of | frames allocated by malloc() with the head pointer in EBP, for | example. Say the return address and caller frame pointer is | stored in the start of the frame. Then you would return by mov | ebx, [ebp]; mov ebp, [ebp+4]; jmp ebx. Or something lile that. | This is a pretty ridiculous calling convention though, but it'd | work. | AshamedCaptain wrote: | In ARM you do have pop/push (and call/ret) instructions that | can go in either direction, and most platforms I know still | have a stack that grows to 0. | | TBH, dunno where the difficulty is. | monocasa wrote: | Sort of. The ARM stuff was allowed other uses because they | were pretty generic load and store multiple instructions with | a lot of increment/decrement options to make up for the | original Acorn's lack of a DMA engine. | | On anything resembling a recent ARM core though, using the | stack pointer register with anything other than a descending | stack has you falling off the perf wagon as it's backed by a | hardware stack engine. | JoeAltmaier wrote: | There is an argument for making the call/return stack a separate | non-addressable region from argument passing. So a malicious app | can't overwrite return addresses and execute arbitrary code. | Intel considered this early on, and rejected it. | | Why? Because of Fortran. There are (were?) cases where a Fortran | app would reach back on the stack to retrieve values from before | the last call or some such. Rather than find some other way of | making that work for those folks, the separate-return-stack idea | was shelved. | | And we all live with this debacle for decades. | stevemk14ebr wrote: | Checkout the shadow stack used for CET (control flow | enforcement technology). It is literally this idea | Someone wrote: | I don't understand that argument. All function arguments still | would be on a single stack (just as in Forth). That would only | be a valid argument if some FORTRAN code inspected the return | address, for example to behave differently, depending on the | caller. | | I would guess it's more because early systems had small address | spaces. If your heap grows up from the bottom of memory, and | one stack grows down from top of your address space, how do you | know where to place that second stack, especially in a system | with, say, a 64kB address space? | JoeAltmaier wrote: | Yes, exactly. Some FORTRAN code did exactly that. | | And the stack could go in a page not addressable by general | purpose instructions, save call and return instructions (and | perhaps some kernel debug). | | Some limited-RAM embedded devices have dedicated stack RAM. | But my point was about Intel x86 | monocasa wrote: | If that was an issue, they managed to fix it decades ago. | Itanium had separate return address and data stacks, and | scientific computing was one of it's few competitive | strongsuits. | titzer wrote: | You cannot address the call stack in WebAssembly, thus there | can be no stack-smashing attacks (that overwrite return | addresses). For languages that pass pointers into the stack, | they must use a "shadow" stack allocated as a separate region | and managed with an explicit stack pointer. | Denvercoder9 wrote: | Is there a reason why a stack growing towards the beginning of | memory is better than one growing towards the end of memory, or | is that just an arbitrary choice the x86 designers made? | vlmutolo wrote: | I wonder if it's at all related to the reasoning in this post | about writing a bump allocator. | | https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.... | bonzini wrote: | It's mostly because it allows you to put code at a fixed | address at the beginning of the memory and the stack at the | opposite end. That lets the OS place the stack at a constant | address (for MS-DOS COM files the stack pointer starts at | 0xFFFE, with a zero already pushed so that a RET instruction | exits the program; a similar convention existed in CP/M). | pcwalton wrote: | I don't know if this is the real reason, but it's more natural | for stack-relative addressing to have the stack grow downwards, | because you can write e.g. [rsp+10] instead of [rsp-10]. | | Apparently on PA-RISC (hppa) the stack grows the other way, so | it is arbitrary. | tom_mellior wrote: | Why is rsp+10 more natural? If rsp is the top of the stack, | everything is below it, and "below" and "minus" go well | together. | | Though I would agree that for the topmost value specifically, | rsp+0 feels more natural than rsp-4 or rsp-8. | xanathar wrote: | Bonus (?): the stack going down rather than up means that | overflowing a stack-allocated buffer will overwrite the contents | that are already on the stack (as they have a higher address than | the last item in the buffer), likely changing the return address | of the function and thus making arbitrary code execution a breeze | (see: https://en.wikipedia.org/wiki/Return-to-libc_attack). | | Most operating systems and standard libraries have checks and | countermeasures to make this a lot harder nowadays, but still. | ajross wrote: | In fact the mitigations have been so effective that in practice | stack smash attacks are mostly historical at this point. But | yes: having the direction of "natural memory copies" be the | same direction as "back into the caller memory on the stack" | was clearly a really bad mistake in hindsight. | | I actually don't know where it came from. It was true in | original PDP-11 Unix for sure: you had the program text | followed by a grows-up heap, with the grows-down stack placed | at the top of the program segment. Interestingly PDP-11 | addressing was general enough to have implemented a grows-up | stack, so this is clearly a mistake Unix could have corrected. | I just don't know if this was the original use of the | convention or if it inherited it from elsewhere. | vlovich123 wrote: | Interestingly, per my reading of [1], these attacks are now | easily available again in WASM due to misplaced confidence in | the security of the language. It's a fantastic paper. | | [1] | https://www.unibw.de/patch/papers/usenixsecurity20-wasm.pdf | xanathar wrote: | I _THINK_ that the reason is: 8086 inherited it from the 8085 | which inherited it from the 8080. The next parent in line | would be the 8008, but that has a small call stack in the CPU | registers rather than RAM, so the ancestor would be the 8080. | | The 8080 had 64KB address space. I bet the rationale was to | partition memory so that classic memory usage goes upwards | from 0000 to FFFF and the stack goes downwards from FFFF to | 0000. This removes the need to define a boundary between the | two beforehand. | | Of course this is totally speculation on my part, I might be | super wrong. | bonzini wrote: | The 8008 instruction set was not designed by Intel IIRC, so | the lineage ends at the 8080. | | Independently the 6502 also had a downward stack. I think | the only modern machine with am upward-growing stack is the | HP PA-RISC. ___________________________________________________________________ (page generated 2021-05-07 23:00 UTC)