[HN Gopher] How Python bytecode is executed ___________________________________________________________________ How Python bytecode is executed Author : r4victor Score : 76 points Date : 2020-11-08 12:48 UTC (1 days ago) (HTM) web link (tenthousandmeters.com) (TXT) w3m dump (tenthousandmeters.com) | g42gregory wrote: | This looks like a great resource. I always wanted to know how | CPython is implemented. | DonaldFisk wrote: | > The UNARY_NEGATIVE opcode pops value from the stack, negates it | and pushes the result. | | Why not have a top of stack register? Then all you need to put in | your case statement (naively) is tos = -tos; | | thereby avoiding a pop and a push. In a virtual machine for a | dynamically typed language such as Python, you'll need to handle | different types, and for a statically typed language you'll need | separate instructions for int and float, but in any case you | avoid the pop and push. If the instruction takes more than one | item from the stack, you at least have one fewer pop. | | Incidentally, Burroughs mainframes were hardware stack machines | and had two top of stack registers, A and B. | teraflop wrote: | In fact, that's exactly how Python implements UNARY_NEGATIVE: | https://github.com/python/cpython/blob/v3.9.0/Python/ceval.c... | | If you look a bit further down in the original article, you'll | see that the BINARY_ADD instruction does something similar. It | pops (a pointer to) the first operand, and modifies (a pointer | to) the second one in-place. | | Semantically, it makes sense to define operations as popping | the operand(s) and pushing a result, for simplicity. But | there's no reason the interpreter has to actually be | implemented that way, as long as the observable behavior is the | same. | | In any case, I wouldn't be surprised if an extra push/pop ended | up having very little performance impact. The compiler might be | able to optimize away the pointer increment/decrement | instructions, and if not, the stack pointer is pretty much | guaranteed to be in the L1 cache. | r4victor wrote: | Hi! This is part 4 of my Python behind the scenes series. The | goal of this post is to understand how the CPython VM executes | Python bytecode. You'll learn: - what is the | evaluation loop and how it's implemented - when and how a | thread may stop executing the bytecode to release the | GIL - how CPython computes things - how CPython | handles exceptions and implements statements like try- | except, try-finally and with | | I appreciate your feedback! Thanks! | heinrichhartman wrote: | Really enjoyed the read! Thanks for taking the time to write | this down. | | It's easy to forget, that if you are running "Python" in | production, you are actually running CPython on a x86 VM | configured with a bunch of *.py files. When things go sideways, | you might find yourself in a situation where knowing CPython | internals becomes relevant. | borishn wrote: | Thanks Victor, this is really useful and well written. | alexpetralia wrote: | This looks like great information. Thanks for sharing! | fulafel wrote: | Would the loop be amenable to speedups from parallel execution? | Some kind of parallel idempotent run ahead version of the loop | might at least prime caches and resolve some dynamic dispatch | stuff in advance. A bit like runahead execution in cpu design. | tachyonbeam wrote: | Might also punt things you need out of the cache if it runs too | far ahead? ___________________________________________________________________ (page generated 2020-11-09 23:02 UTC)