[HN Gopher] My own C compiler on my own compiler infrastructure
       ___________________________________________________________________
        
       My own C compiler on my own compiler infrastructure
        
       Author : uint256_t
       Score  : 113 points
       Date   : 2020-10-23 14:43 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mshockwave wrote:
       | I'm pretty excited to see the infrastructure (i.e. cilk). But I'm
       | wondering if it "fixes" some of the flaw in LLVM IR? For example,
       | (not really a flaw but I rather prefer) using basic block
       | parameters instead of PHI nodes.
        
         | jcranmer wrote:
         | There's no theoretical difference between phis and basic block
         | parameters.
         | 
         | When you do SSA conversion, you need a notion that allows you
         | to bind a value to a variable along an edge in the control-flow
         | graph. You can choose to actually have some edge data that
         | indicates these bindings, but this tends to be a poor
         | engineering solution. If it has to go into a basic block, you
         | need to either bind it to the source basic block or the target
         | basic block. Bind it to the target, and you get phi nodes. Bind
         | it to the source, and you get basic block parameters.
         | 
         | The choice of which to use is an engineering tradeoff. Phis
         | have the issue that they are pseudo-instructions existing as
         | the leading instructions in a basic block, and you may well
         | need to special-case them in transformation or analysis passes.
         | Using basic block arguments also provides a more direct analogy
         | to interprocedural optimization, and you could potentially use
         | the exact same code with a touch of specialization to handle
         | that domain.
         | 
         | Since the two forms are moving the edge instructions to
         | different places, they make different questions easier. With a
         | phi node, it's quick to see if all predecessors are providing
         | the same value. With a basic block argument, it's quick to see
         | which variables are live out (assuming you're restricted to
         | using only basic block-local variables [1]). In general, if you
         | need to query all-predecessor analyses, it's easier with phis;
         | if you need all-successor analyses, it's easier with basic
         | block arguments.
         | 
         | [1] The choice of where variables can be used is _also_
         | orthogonal to phis versus basic block arguments. Loop-closed
         | SSA already exists, which prevents you from using a variable
         | outside of its loop.
        
           | tom_mellior wrote:
           | > There's no theoretical difference between phis and basic
           | block parameters.
           | 
           | I question this claim. Don't basic block parameters allow you
           | to do interesting stuff also at _split_ points, not only at
           | _join_ points?
           | 
           | For example:                   if (x == 0) {             y =
           | 2 * x;         } else {             y = 3 * x;         }
           | 
           | In SSA form this is:                       if (x == 0) jump
           | B1 else jump B2         B1:             y1 = 2 * x
           | jump join         B2:             y2 = 3 * x             jump
           | join         join:             y = phi(y1, y2)
           | 
           | You need conditional constant propagation to realize that y1
           | must be 0.
           | 
           | But with basic block arguments you could do this (improvising
           | notation on the spot):                       if (x == 0) jump
           | B1(0) else jump B2(x)         B1(arg1):             y1 = 2 *
           | arg1;             jump join(y1)         B2(arg2):
           | y2 = 3 * arg2;             jump join(y2)         join(y):
           | ...
           | 
           | Basically, as soon as information becomes available (x is 0
           | along some path), that is explicit in the IR. Realizing that
           | y1 is 0 only takes simple, non-conditional sparse constant
           | propagation. You might not call that a "theoretical"
           | difference (what theory are we talking about?), but it does
           | give you a simple way to access information that is not
           | explicit in SSA form.
           | 
           | In effect this latter representation is something like Static
           | Single Information (SSI) form: https://citeseerx.ist.psu.edu/
           | viewdoc/summary?doi=10.1.1.1.9...
        
             | jcranmer wrote:
             | As I explained in my footnote, you can do the same thing
             | with SSA form:                   if (x == 0) jump B1 else
             | jump B2       B1:         x1 = phi(x)         y1 = 2 * x1
             | jump join       B2:         x2 = phi(x)         y2 = 3 * x2
             | jump join       join:         y = phi(y1, y2)
             | 
             | And then you can do the same propagation of the x == 0 to
             | the phi in B1 pretty easily. It is somewhat harder than the
             | basic-block argument case, but that's because basic block
             | arguments rely on the data binding being put in the head of
             | the edge and we're pushing data from the head into the
             | edge. The next step of checking if all incoming edges are
             | identically 0 would be easier in the phi-form, since the
             | data binding is in the tail node, and we're pulling data
             | from the edge into the tail.
        
         | uint256_t wrote:
         | (I'm the owner of cilk.) I didn't know the basic block
         | parameters. It seems better than using phi nodes. Thanks.
        
           | mshockwave wrote:
           | Yeah basic block parameters are especially useful when doing
           | data flow analysis
        
         | tptacek wrote:
         | Huh! I stopped learnin' compilery stuff after SSA and was
         | unaware of the approach. If anyone else is wondering, here's a
         | good Reddit thread on bblock parameters:
         | 
         | https://www.reddit.com/r/ProgrammingLanguages/comments/9z8qu...
        
           | mhh__ wrote:
           | It's relatively new in the mainstream of compiler's, i.e.
           | MSVC only got SSA in its backend 5 years ago or so.
           | 
           | I think Appel wrote a paper comparing SSA to functional
           | programming way back in the 90s.
        
             | bonzini wrote:
             | It's also included in his books ("Modern compiler
             | programming in <insert language name>", where the language
             | is only used in the first third of the book).
        
       | einpoklum wrote:
       | It says the compiler is "Powered by Cilk". But - what does that
       | mean?
       | 
       | https://en.wikipedia.org/wiki/Cilk
       | 
       | isn't Cilk a C-like language with additions for parallel
       | execution? How is that a compiler infrastructure?
        
         | sergeykish wrote:
         | cilk in dependencies, should search "rust cilk"
         | 
         | https://github.com/maekawatoshiki/cilk
        
         | Retr0spectrum wrote:
         | See the parent directory of the repo linked in the OP:
         | https://github.com/maekawatoshiki/cilk
        
           | einpoklum wrote:
           | Ah, interesting, thanks.
           | 
           | Well, the example shows convenient syntax... but - it also
           | says it's a "toy" project.
        
         | alcidesfonseca wrote:
         | I agree that Cilk is not a great name for compiler stuff as
         | there is already a Cilk extension to C.
         | 
         | That said, it's the author's prerogative.
        
       ___________________________________________________________________
       (page generated 2020-10-23 23:00 UTC)