[HN Gopher] Compilers and IRs: LLVM IR, SPIR-V, and MLIR
       ___________________________________________________________________
        
       Compilers and IRs: LLVM IR, SPIR-V, and MLIR
        
       Author : matt_d
       Score  : 36 points
       Date   : 2022-10-29 19:19 UTC (3 hours ago)
        
 (HTM) web link (www.lei.chat)
 (TXT) w3m dump (www.lei.chat)
        
       | thechao wrote:
       | I have an irrational dislike of SPIR-V. On the flip side of the
       | coin I think MLIR is a work of genius -- especially as a
       | springboard for ideas in developing custom IR.
        
         | fooker wrote:
         | MLIR is also siloing compiler research and development.
         | 
         | Every one and their mother has their own proprietary MLIR
         | dialect nowadays, and the era of competitive open source
         | compilers is sort of fading.
        
       | k4st wrote:
       | At Trail of Bits, we are creating a new compiler front/middle end
       | for Clang called VAST [1]. It consumes Clang ASTs and creates a
       | high-level, information-rich MLIR dialect. Then, we progressively
       | lower it through various other dialects, eventually down to the
       | LLVM dialect in MLIR, which can be translated directly to LLVM
       | IR.
       | 
       | Our goals with this pipeline are to enable static analyses that
       | can choose the right abstraction level(s) for their goals, and
       | using provenance, cross abstraction levels to relate results back
       | to source code.
       | 
       | Neither Clang ASTs nor LLVM IR alone meet our needs for static
       | analysis. Clang ASTs are too verbose and lack explicit
       | representations for implicit behaviours in C++. LLVM IR isn't
       | really "one IR," it's a two IRs (LLVM proper, and metadata),
       | where LLVM proper is an unspecified family of dialects (-O0, -O1,
       | -O2, -O3, then all the arch-specific stuff). LLVM IR also isn't
       | easy to relate to source, even in the presence of maximal debug
       | information. The Clang codegen process does ABI-specific lowering
       | takes high-level types/values and transforms them to be more
       | amenable to storing in target-cpu locations (e.g. registers).
       | This actively works against relating information across levels;
       | something that we want to solve with intermediate MLIR dialects.
       | 
       | Beyond our static analysis goals, I think an MLIR-based setup
       | will be a key enabler of library-aware compiler optimizations.
       | Right now, library-aware optimizations are challenging because
       | Clang ASTs are hard to mutate, and by the time things are in LLVM
       | IR, the abstraction boundaries provided by libraries are broken
       | down by optimizations (e.g. inlining, specialization, folding),
       | forcing optimization passes to reckon with the mechanics of how
       | libraries are implemented.
       | 
       | We're very excited about MLIR, and we're pushing full steam ahead
       | with VAST. MLIR is a technology that we can use to fix a lot of
       | issues in Clang/LLVM that hinder really good static analysis.
       | 
       | [1] https://github.com/trailofbits/vast
        
         | erichocean wrote:
         | > _LLVM dialect in MLIR, which can be translated directly to
         | MLIR_
         | 
         | Should be:
         | 
         | LLVM dialect in MLIR, which can be translated directly to _LLVM
         | IR_
         | 
         | Otherwise, great project! We're also using MLIR internally and
         | it's been awesome, game-changing even when considering how much
         | can be accomplished with a reasonable amount of effort.
        
           | k4st wrote:
           | Typo fixed! Thanks :-)
           | 
           | I think the next big problems for MLIR to address are things
           | like: metadata/location maintenance when integrating with
           | third-party dialects and transformations. With LLVM
           | optimizations, getting the optimization right has always
           | seemed like the top priority, and then maybe getting metadata
           | propagation working came a distant second.
           | 
           | I think the opportunity with MLIR is that metadata/location
           | info can be the old nodes or other dialects. In our work, we
           | want a tower/progression of IRs, and we want them
           | _simultaneously_ in memory, all living together. You could
           | think of the debug metadata for a lower level dialect being
           | the higher level dialect. This is why I sometimes think about
           | LLVM IR as really being two IRs: LLVM  "code" and metadata
           | nodes. Metadata nodes in LLVM IR can represent arbitrary
           | structures, but lack concrete checks/balances. MLIR fixes
           | this by unifying the representations, bringing in structure
           | while retaining flexibility.
        
       | manv1 wrote:
       | Funny that there was no mention of GCC, since it was probably one
       | of the first IRs that anyone encountered IRL. If I remember
       | correctly one motivation for Clang/LLVM was because GCC's IR was
       | so bad.
       | 
       | I knew people that wrote backends for gcc, and they pretty much
       | all agreed it was a nightmare.
        
       ___________________________________________________________________
       (page generated 2022-10-29 23:00 UTC)