[HN Gopher] A complete compiler and VM in 150 lines of code ___________________________________________________________________ A complete compiler and VM in 150 lines of code Author : p4bl0 Score : 62 points Date : 2022-07-16 20:51 UTC (2 hours ago) (HTM) web link (gist.github.com) (TXT) w3m dump (gist.github.com) | userbinator wrote: | For those wondering what language this is for before clicking on | it: it's not much more than what I think would traditionally be | called an expression evaluator, and _very_ simple one at that --- | it only recognises binary logic expressions, and the 150 lines | doesn 't include the parser or lexer either. Not a bad attempt | for what is presumably a first try, but I think "complete | compiler" is overselling it. | a-dub wrote: | there's no type checking, which i'd say is a fairly integral | part of a compiler... | | but it does use a parser generator to generate a parser that | constructs an ast and it then iterates that ast to generate a | bytecode. | | that's basically a compiler, even if the language is basic | expressions. | userbinator wrote: | B has no type checking either. | | On the other hand, using a parser generator and not counting | its output in the line count does seem like cheating. | p4bl0 wrote: | > using a parser generator and not counting its output in | the line count does seem like cheating | | Yeah maybe I should count OCaml's standard library lines of | code too. Maybe also count expanded macros and inlined | functions at each call site? At which point should I stop? | Are assembly language pseudo-instructions cheating too? | | Common, don't be ridiculous. | p4bl0 wrote: | Hey there, author here :). The language is intentionally _very_ | simple indeed, because the goal is to showcase all | indispensable steps of a complete compiler. I initially wrote | it for my students as an introduction to my compiler class (for | the very first session). | | > would traditionally be called an expression evaluator | | It is not an evaluator, evaluators are included (the different | eval functions). | | > the 150 lines doesn't include the parser or lexer either | | This is just wrong: $ cat *.ml{,l,y} *.c | | sed '/^\s*$/d' | wc -l 149 | | and these 149 lines (okay, 169 if you count blank lines which | are removed using sed in the count above) include the two | `eval` and the `run` functions which are not actually part of | the compiler but here for clarity (they help understanding how | the successive intermediate representations of the boolean | expressions work -- without them the entire project is less | than 130 lines). | | > Not a bad attempt for what is presumably a first try | | It's not ;). It is however targeted at beginner compsci | students who have absolutely no idea how compilers are built | yet :). | | > but I think "complete compiler" is overselling it. | | It does lexing & parsing to produce an AST of a boolean | expression, then a traversal of this AST to transform it into a | second intermediate representation (an AST of NAND-only | expression), then _compiles_ the expression into a list of | instructions (and this paradigm shift is a big step that is | tough to teach to some students, believe me ^^), and then emits | the corresponding bytecode (which here is 1:1 with the | imperative instructions), and then the VM interprets this | bytecode. I don 't think I'm overselling it :). | | The only way it is not complete -- and that's written in the | README -- is that it has absolutely no semantics analysis | (there is only one type, and no names, so there is nothing to | analyze ^^), but that is not _mandatory_ to be called a | compiler. | | Then again, this is only the first hour of a semester long | compiler class (which covers semantic analysis, don't worry) | ;). | userbinator wrote: | _The only way it is not complete -- and that 's written in | the README -- is that it has absolutely no semantics | analysis_ | | I was referring to the complete lack of flow control. Could | you compile the compiler with itself, for example? | | _Then again, this is only the first hour of a semester long | compiler class_ | | If you're taking a course, you might enjoy "leaping ahead" by | studying C4 and C4x86. They are also VM-based, and _self- | compiling_ , yet still tiny. I think they really raised the | bar on what a tiny-and-complete compiler is. | (https://news.ycombinator.com/item?id=8558822 | https://news.ycombinator.com/item?id=8746054) | p4bl0 wrote: | > Could you compile the compiler with itself, for example? | | That's only possible if you write the compiler in the | language it compiles. By this account writing a Python | compiler in something else than Python, for example in C, | would make it incomplete? It doesn't make much sense. | | > If you're taking a course (...) | | You should read my post more carefully, I think. | tiagoleifert wrote: | > I was referring to the complete lack of flow control. | Could you compile the compiler with itself, for example? | | This is not a requirement for it to be a compiler. | | > They are also VM-based, and self-compiling, yet still | tiny. I think they really raised the bar on what a tiny- | and-complete compiler is. | | The purpose of OP's compiler is educational so this | comparison makes no sense. | blacklion wrote: | Sooo nice :-) It looks like small cute kitten to me, sorry. | denotational wrote: | As another comment also alludes, there's probably quite a lot | more work required before this could be called a complete | compiler (without heavy qualification as to what it actually | compiles, since the language of boolean expressions probably | isn't something that most people would consider to be a | programming language per se); that said, toy languages like this | are a great way to explore some fundamental principles of | programming language theory. | | Boolean expressions have very trivial denotational semantics | (since there's no recursion or state), so you could try | formalising these in Coq (FRAP [1] and Software Foundations Vol. | 2 [2] are both good places to learn more about this). | | Next, you could write formal operational semantics for your VM, | and then finally prove that your compiler is correct in a formal | sense (i.e. the image of any well-formed input under the compiler | operation has a terminating execution in the operational | semantics that yields the same value as would be obtained by the | execution of the input expression under the denotational | semantics). | | The last part would require implementing the compiler in Coq, but | since you've written it in OCaml it probably wouldn't be very | hard to port. | | [1] : https://frap.csail.mit.edu/main | | [2] : https://softwarefoundations.cis.upenn.edu/plf- | current/toc.ht... | | Edit: Having looked at your profile I now realise that you | probably know all of this already, but it might be informative | for anyone else reading :) | prezjordan wrote: | Where can I learn more about mll and mly files? I've hand-rolled | parsers and it's always my least favorite part of toy languages. | [deleted] | a-dub wrote: | https://v2.ocaml.org/manual/lexyacc.html | tekknolagi wrote: | Search for ocamllex and ocamlyacc. Also Menhir. | mftb wrote: | So weird, within the _last hour_ I was wondering why I never | see regular, old, lex and yacc around anymore, and then I | came across this, which made realize it 's probably because | they generate C. Funny. I guess these output OCaml? | p4bl0 wrote: | Yes, they do. Lex and yacc equivalent tools are available | for many languages. | | Examples : | | - Python: https://ply.readthedocs.io/ (I already used it in | a very small project, again for my students, here is an | example of how it works: https://gist.github.com/p4bl0-/f5e | d1e60fdc5e76a2d321bc8708a7...) | | - Racket: https://docs.racket-lang.org/parser-tools/ (it's | really great, I previously used Racket for my compiler | course but I've switched to OCaml a few years back, because | Racket parser tools don't go well with Typed Racket). | skybrian wrote: | I think people are getting hung up on the word "complete." This | is a complete compiler in the sense that "hello world" is a | complete (but minimal) program. | p4bl0 wrote: | Yes exactly :), I meant that it does everything from lexing the | source code to emitting bytecode instructions (and provides the | VM to run this bytecode). It is not a part of a compiler, it is | a tiny, simple, stupid, but _complete_ , compiler. ___________________________________________________________________ (page generated 2022-07-16 23:00 UTC)