[HN Gopher] How Python bytecode is executed
       ___________________________________________________________________
        
       How Python bytecode is executed
        
       Author : r4victor
       Score  : 76 points
       Date   : 2020-11-08 12:48 UTC (1 days ago)
        
 (HTM) web link (tenthousandmeters.com)
 (TXT) w3m dump (tenthousandmeters.com)
        
       | g42gregory wrote:
       | This looks like a great resource. I always wanted to know how
       | CPython is implemented.
        
       | DonaldFisk wrote:
       | > The UNARY_NEGATIVE opcode pops value from the stack, negates it
       | and pushes the result.
       | 
       | Why not have a top of stack register? Then all you need to put in
       | your case statement (naively) is                   tos = -tos;
       | 
       | thereby avoiding a pop and a push. In a virtual machine for a
       | dynamically typed language such as Python, you'll need to handle
       | different types, and for a statically typed language you'll need
       | separate instructions for int and float, but in any case you
       | avoid the pop and push. If the instruction takes more than one
       | item from the stack, you at least have one fewer pop.
       | 
       | Incidentally, Burroughs mainframes were hardware stack machines
       | and had two top of stack registers, A and B.
        
         | teraflop wrote:
         | In fact, that's exactly how Python implements UNARY_NEGATIVE:
         | https://github.com/python/cpython/blob/v3.9.0/Python/ceval.c...
         | 
         | If you look a bit further down in the original article, you'll
         | see that the BINARY_ADD instruction does something similar. It
         | pops (a pointer to) the first operand, and modifies (a pointer
         | to) the second one in-place.
         | 
         | Semantically, it makes sense to define operations as popping
         | the operand(s) and pushing a result, for simplicity. But
         | there's no reason the interpreter has to actually be
         | implemented that way, as long as the observable behavior is the
         | same.
         | 
         | In any case, I wouldn't be surprised if an extra push/pop ended
         | up having very little performance impact. The compiler might be
         | able to optimize away the pointer increment/decrement
         | instructions, and if not, the stack pointer is pretty much
         | guaranteed to be in the L1 cache.
        
       | r4victor wrote:
       | Hi! This is part 4 of my Python behind the scenes series. The
       | goal of this post is to understand how the CPython VM executes
       | Python bytecode. You'll learn:                 - what is the
       | evaluation loop and how it's implemented       - when and how a
       | thread may stop executing the bytecode to          release the
       | GIL       - how CPython computes things       - how CPython
       | handles exceptions and implements statements          like try-
       | except, try-finally and with
       | 
       | I appreciate your feedback! Thanks!
        
         | heinrichhartman wrote:
         | Really enjoyed the read! Thanks for taking the time to write
         | this down.
         | 
         | It's easy to forget, that if you are running "Python" in
         | production, you are actually running CPython on a x86 VM
         | configured with a bunch of *.py files. When things go sideways,
         | you might find yourself in a situation where knowing CPython
         | internals becomes relevant.
        
         | borishn wrote:
         | Thanks Victor, this is really useful and well written.
        
         | alexpetralia wrote:
         | This looks like great information. Thanks for sharing!
        
       | fulafel wrote:
       | Would the loop be amenable to speedups from parallel execution?
       | Some kind of parallel idempotent run ahead version of the loop
       | might at least prime caches and resolve some dynamic dispatch
       | stuff in advance. A bit like runahead execution in cpu design.
        
         | tachyonbeam wrote:
         | Might also punt things you need out of the cache if it runs too
         | far ahead?
        
       ___________________________________________________________________
       (page generated 2020-11-09 23:02 UTC)