[HN Gopher] A complete compiler and VM in 150 lines of code
       ___________________________________________________________________
        
       A complete compiler and VM in 150 lines of code
        
       Author : p4bl0
       Score  : 62 points
       Date   : 2022-07-16 20:51 UTC (2 hours ago)
        
 (HTM) web link (gist.github.com)
 (TXT) w3m dump (gist.github.com)
        
       | userbinator wrote:
       | For those wondering what language this is for before clicking on
       | it: it's not much more than what I think would traditionally be
       | called an expression evaluator, and _very_ simple one at that ---
       | it only recognises binary logic expressions, and the 150 lines
       | doesn 't include the parser or lexer either. Not a bad attempt
       | for what is presumably a first try, but I think "complete
       | compiler" is overselling it.
        
         | a-dub wrote:
         | there's no type checking, which i'd say is a fairly integral
         | part of a compiler...
         | 
         | but it does use a parser generator to generate a parser that
         | constructs an ast and it then iterates that ast to generate a
         | bytecode.
         | 
         | that's basically a compiler, even if the language is basic
         | expressions.
        
           | userbinator wrote:
           | B has no type checking either.
           | 
           | On the other hand, using a parser generator and not counting
           | its output in the line count does seem like cheating.
        
             | p4bl0 wrote:
             | > using a parser generator and not counting its output in
             | the line count does seem like cheating
             | 
             | Yeah maybe I should count OCaml's standard library lines of
             | code too. Maybe also count expanded macros and inlined
             | functions at each call site? At which point should I stop?
             | Are assembly language pseudo-instructions cheating too?
             | 
             | Common, don't be ridiculous.
        
         | p4bl0 wrote:
         | Hey there, author here :). The language is intentionally _very_
         | simple indeed, because the goal is to showcase all
         | indispensable steps of a complete compiler. I initially wrote
         | it for my students as an introduction to my compiler class (for
         | the very first session).
         | 
         | > would traditionally be called an expression evaluator
         | 
         | It is not an evaluator, evaluators are included (the different
         | eval functions).
         | 
         | > the 150 lines doesn't include the parser or lexer either
         | 
         | This is just wrong:                   $ cat *.ml{,l,y} *.c |
         | sed '/^\s*$/d' | wc -l         149
         | 
         | and these 149 lines (okay, 169 if you count blank lines which
         | are removed using sed in the count above) include the two
         | `eval` and the `run` functions which are not actually part of
         | the compiler but here for clarity (they help understanding how
         | the successive intermediate representations of the boolean
         | expressions work -- without them the entire project is less
         | than 130 lines).
         | 
         | > Not a bad attempt for what is presumably a first try
         | 
         | It's not ;). It is however targeted at beginner compsci
         | students who have absolutely no idea how compilers are built
         | yet :).
         | 
         | > but I think "complete compiler" is overselling it.
         | 
         | It does lexing & parsing to produce an AST of a boolean
         | expression, then a traversal of this AST to transform it into a
         | second intermediate representation (an AST of NAND-only
         | expression), then _compiles_ the expression into a list of
         | instructions (and this paradigm shift is a big step that is
         | tough to teach to some students, believe me ^^), and then emits
         | the corresponding bytecode (which here is 1:1 with the
         | imperative instructions), and then the VM interprets this
         | bytecode. I don 't think I'm overselling it :).
         | 
         | The only way it is not complete -- and that's written in the
         | README -- is that it has absolutely no semantics analysis
         | (there is only one type, and no names, so there is nothing to
         | analyze ^^), but that is not _mandatory_ to be called a
         | compiler.
         | 
         | Then again, this is only the first hour of a semester long
         | compiler class (which covers semantic analysis, don't worry)
         | ;).
        
           | userbinator wrote:
           | _The only way it is not complete -- and that 's written in
           | the README -- is that it has absolutely no semantics
           | analysis_
           | 
           | I was referring to the complete lack of flow control. Could
           | you compile the compiler with itself, for example?
           | 
           |  _Then again, this is only the first hour of a semester long
           | compiler class_
           | 
           | If you're taking a course, you might enjoy "leaping ahead" by
           | studying C4 and C4x86. They are also VM-based, and _self-
           | compiling_ , yet still tiny. I think they really raised the
           | bar on what a tiny-and-complete compiler is.
           | (https://news.ycombinator.com/item?id=8558822
           | https://news.ycombinator.com/item?id=8746054)
        
             | p4bl0 wrote:
             | > Could you compile the compiler with itself, for example?
             | 
             | That's only possible if you write the compiler in the
             | language it compiles. By this account writing a Python
             | compiler in something else than Python, for example in C,
             | would make it incomplete? It doesn't make much sense.
             | 
             | > If you're taking a course (...)
             | 
             | You should read my post more carefully, I think.
        
             | tiagoleifert wrote:
             | > I was referring to the complete lack of flow control.
             | Could you compile the compiler with itself, for example?
             | 
             | This is not a requirement for it to be a compiler.
             | 
             | > They are also VM-based, and self-compiling, yet still
             | tiny. I think they really raised the bar on what a tiny-
             | and-complete compiler is.
             | 
             | The purpose of OP's compiler is educational so this
             | comparison makes no sense.
        
       | blacklion wrote:
       | Sooo nice :-) It looks like small cute kitten to me, sorry.
        
       | denotational wrote:
       | As another comment also alludes, there's probably quite a lot
       | more work required before this could be called a complete
       | compiler (without heavy qualification as to what it actually
       | compiles, since the language of boolean expressions probably
       | isn't something that most people would consider to be a
       | programming language per se); that said, toy languages like this
       | are a great way to explore some fundamental principles of
       | programming language theory.
       | 
       | Boolean expressions have very trivial denotational semantics
       | (since there's no recursion or state), so you could try
       | formalising these in Coq (FRAP [1] and Software Foundations Vol.
       | 2 [2] are both good places to learn more about this).
       | 
       | Next, you could write formal operational semantics for your VM,
       | and then finally prove that your compiler is correct in a formal
       | sense (i.e. the image of any well-formed input under the compiler
       | operation has a terminating execution in the operational
       | semantics that yields the same value as would be obtained by the
       | execution of the input expression under the denotational
       | semantics).
       | 
       | The last part would require implementing the compiler in Coq, but
       | since you've written it in OCaml it probably wouldn't be very
       | hard to port.
       | 
       | [1] : https://frap.csail.mit.edu/main
       | 
       | [2] : https://softwarefoundations.cis.upenn.edu/plf-
       | current/toc.ht...
       | 
       | Edit: Having looked at your profile I now realise that you
       | probably know all of this already, but it might be informative
       | for anyone else reading :)
        
       | prezjordan wrote:
       | Where can I learn more about mll and mly files? I've hand-rolled
       | parsers and it's always my least favorite part of toy languages.
        
         | [deleted]
        
         | a-dub wrote:
         | https://v2.ocaml.org/manual/lexyacc.html
        
         | tekknolagi wrote:
         | Search for ocamllex and ocamlyacc. Also Menhir.
        
           | mftb wrote:
           | So weird, within the _last hour_ I was wondering why I never
           | see regular, old, lex and yacc around anymore, and then I
           | came across this, which made realize it 's probably because
           | they generate C. Funny. I guess these output OCaml?
        
             | p4bl0 wrote:
             | Yes, they do. Lex and yacc equivalent tools are available
             | for many languages.
             | 
             | Examples :
             | 
             | - Python: https://ply.readthedocs.io/ (I already used it in
             | a very small project, again for my students, here is an
             | example of how it works: https://gist.github.com/p4bl0-/f5e
             | d1e60fdc5e76a2d321bc8708a7...)
             | 
             | - Racket: https://docs.racket-lang.org/parser-tools/ (it's
             | really great, I previously used Racket for my compiler
             | course but I've switched to OCaml a few years back, because
             | Racket parser tools don't go well with Typed Racket).
        
       | skybrian wrote:
       | I think people are getting hung up on the word "complete." This
       | is a complete compiler in the sense that "hello world" is a
       | complete (but minimal) program.
        
         | p4bl0 wrote:
         | Yes exactly :), I meant that it does everything from lexing the
         | source code to emitting bytecode instructions (and provides the
         | VM to run this bytecode). It is not a part of a compiler, it is
         | a tiny, simple, stupid, but _complete_ , compiler.
        
       ___________________________________________________________________
       (page generated 2022-07-16 23:00 UTC)