[HN Gopher] Tiny-C Compiler (2001)
       ___________________________________________________________________
        
       Tiny-C Compiler (2001)
        
       Author : swatson741
       Score  : 199 points
       Date   : 2023-03-13 10:30 UTC (12 hours ago)
        
 (HTM) web link (www.iro.umontreal.ca)
 (TXT) w3m dump (www.iro.umontreal.ca)
        
       | WoodenChair wrote:
       | This is an interpreter for a super restricted subset of C and it
       | looks well written from a pedagogical standpoint (keeps thing
       | pretty simple, fairly easy to read). But it's slightly awk to
       | strip-down a language (what features do you keep, what do you
       | lose?). I think it's more fun to build an interpreter for an
       | actual tiny language. In my next book I have interpreters for
       | Brainfuck [0], an obfuscated kind of joke of a language, and Tiny
       | BASIC[1] a real tiny language that was used on early personal
       | computers. These are pretty common first projects for folks
       | interested in doing an interpreter.
       | 
       | Here's why real languages are better than stripped down
       | languages: Anyone with programming knowledge can implement a
       | Brainfuck interpreter in a few hours and run any Brainfuck
       | program. Anyone with a tiny bit of CS knowledge can implement a
       | Tiny BASIC interpreter in just a day and then you can run any
       | real Tiny BASIC program from the late 70s. It's cool to run real
       | programs people actually used. With this stripped down C, there
       | are no pre-made real programs...
       | 
       | 0:https://en.wikipedia.org/wiki/Brainfuck
       | 1:https://en.wikipedia.org/wiki/Tiny_BASIC
        
         | Gordonjcp wrote:
         | FORTH is another language that's quick and easy to write from
         | scratch, where you need a couple of dozen words written in
         | assembler and then the rest of FORTH can be written in FORTH.
        
         | doodlesdev wrote:
         | Another language that's more modern and currently useful but
         | which is very tiny to write an interpreter for is Lua [0][1].
         | Currently the official Lua interpreter has around 30k LOC which
         | I find pretty amusing for a language used so widely in games
         | and for scripting purposes [2]. Of course it's still at least
         | an order of magnitude larger than a small Tiny BASIC
         | interpreter but the fact it's a current language used in so
         | many places makes it even more interesting to make your for-fun
         | implementation.
         | 
         | Also related to small language implementations I find notable
         | PicoC [3] which is a C interpreter written in around 3k LOC of
         | C. Past discussion about it here 13 years ago [4].
         | 
         | [0]: https://www.lua.org/about.html
         | 
         | [1]: https://www.lua.org/spe.html
         | 
         | [2]: https://en.wikipedia.org/wiki/Lua_(programming_language)
         | 
         | [3]: https://gitlab.com/zsaleeba/picoc
         | 
         | [4]: https://news.ycombinator.com/item?id=1658890
        
         | benj111 wrote:
         | While I appreciate your point.
         | 
         | 1. You use the example of a tiny basic of a 'real language' and
         | I don't see how tiny basic is a 'real language', but tiny C is
         | a stripped down language.
         | 
         | 2. You can build on this to make a full c implementation. A
         | minimal c implementation that can potentially bootstrap a full
         | c environment is more useful than a brainfuck interpreter.
        
       | northernskys30 wrote:
       | I did my CS degree at umontreal and this was an assignment in a
       | second year class. This was a pretty interesting introduction to
       | compilers, and even if this is a toy subset of C, this was
       | challenging, at least for me. We would get 0 if there were any
       | memory leak, so we were pretty paranoid about it.
       | 
       | The second assignment was writing a Scheme interpreter.
        
         | ndiddy wrote:
         | That's kind of surprising they cared so much about memory use,
         | a lot of one-shot C programs such as compilers don't bother
         | freeing memory and let the OS clean up after them once they
         | exit.
        
           | ComputerGuru wrote:
           | I was about to comment and say the same thing, but as a
           | graded learning exercise there is certainly value in that
           | approach.
        
       | ttvecthrowaway wrote:
       | Not to be confused with https://bellard.org/tcc/, which is a tiny
       | compiler for the C language.
        
         | Laaas wrote:
         | I use tcc for all of my small C "scripts" for doing ioctls,
         | etc. Less bloat, suckless. I imagine most software would be
         | better off using tcc than gcc/clang. Performance isn't that
         | important in most cases.
        
           | notorandit wrote:
           | I think you are confusing the work of Frabrice Bellard with
           | this very one. The former is a C-language compiler. This once
           | is a compiler for a language called "Tiny C". Understandable
           | confusion, though.
        
             | [deleted]
        
           | doublepg23 wrote:
           | > Performance isn't that important in most cases.
           | 
           | Optimizing for storage space is...better?
        
             | vidarh wrote:
             | Since they say "scripts", note that tcc supports being
             | invoked in the shebang line. E.g.
             | #!/usr/bin/tcc -run
             | 
             | You _can_ do that with gcc /clang too (e.g. #if 0, #endif
             | to wrap a block of shell script to compile the current file
             | and execute the result) but a primary value of tcc is that
             | it _compiles fast_.
             | 
             | On a more philosophical note, the suckless approach is to
             | optimise for _simplicity_ not storage. It 's perfectly
             | valid to disagree with that of course, but if simplicitly
             | of the system as a whole is a consideration gcc and clang
             | doesn't really fit.
        
               | LukeShu wrote:
               | You can only _sort of_ do that with gcc /clang. The #if 0
               | trick relies on funny behavior that is in a few common
               | shells. When you try to execve(2) a script without a
               | proper #! shebang, the kernel will return ENOEXEC. Bash
               | will check for ENOEXEC then check a few heuristics to see
               | if it looks like a text file, and if it does, then it
               | will try to run it as a shell script.
               | 
               | This means that your script will work when run from a
               | shell, but won't work when exec()ed from a non-shell
               | program, which is a weird foot-gun.
        
           | LanternLight83 wrote:
           | Thanks for sharing! I've yet to go through my C phase, but
           | see it on the horizon, and will remember this and the shebang
           | trick.
        
             | kevin_thibedeau wrote:
             | This is a recommended practice for scripting with Nim if
             | you want a batteries-included language.
        
           | circuit10 wrote:
           | I feel like a lot of software written in C is written in C
           | for performance reasons. Obviously that's not always the case
           | and TCC is useful but I wouldn't say that that most software
           | should use it
        
         | squarefoot wrote:
         | It is sad that tcc is unmaintained as it would be really useful
         | in small embedded systems. I just tried it on Debian and
         | compilation fails without #undefining CONFIG_TCC_MALLOC_HOOKS
         | in lib/bcheck.c. After compilation it passes tests, but they
         | warn that it could be unreliable.
        
           | jart wrote:
           | Try chibicc. It's x86_64 native and so much more readable as
           | a codebase than TCC.
        
           | dantrell wrote:
           | While Fabrice Bellard is no longer working on TCC [0] and an
           | official release tarball hasn't been packaged since version
           | 0.9.27 (5 years ago) the project is by no means unmaintained.
           | 
           | For details, check their current working repository [1] and
           | mailing list [2].
           | 
           | [0]: https://bellard.org/tcc/
           | 
           | [1]: https://repo.or.cz/tinycc.git
           | 
           | [2]: https://lists.nongnu.org/archive/html/tinycc-devel/
        
         | siliconunit wrote:
         | I'm quite confused, not the same project at all? To me tiny c
         | compiler always meant the bellard page. Super useful stuff for
         | micro hacky projects.
        
           | hawski wrote:
           | One could say that the one from this submission is Tiny-C
           | Compiler and Bellard's is Tiny C-Compiler.
        
           | Narishma wrote:
           | This is a compiler for a language called Tiny-C.
        
           | notorandit wrote:
           | I understand the confusion: it is more about "syntax
           | associativity"
           | 
           | (tiny C) compiler --> "This is a compiler for the Tiny-C
           | language"
           | 
           | vs
           | 
           | Tiny (C compiler) --> "TinyCC [...] is a small but hyper fast
           | C compiler"
           | 
           | That's it! ;-)
        
             | moffkalast wrote:
             | Now obviously the next step is to make a tiny tiny c
             | compiler compiler.
        
       | Koshkin wrote:
       | Sigh. I wish people would teach compilers using Oberon as an
       | example. One can write a small yet complete compiler for (what
       | turns out to be not-so-tiny) a language.
        
         | peacefulhat wrote:
         | Best to pick languages anybody has heard of.
        
       | stevekemp wrote:
       | That's a cute project, thanks for sharing.
       | 
       | I hacked in support for ">", ">=", and "<=" to match the "<"
       | support, but I just noticed that ints are truncated, so the
       | maximum value stored in a variable is 127.
        
       | bitwize wrote:
       | Oh, Marc Feeley. Wonder if we'll see a Tiny-C target for Gambit?
        
         | feeley wrote:
         | That's not on my TODO! But Gambit does have support for TCC.
         | For example you can use TCC to compile a file to a dynamically
         | loadable object file (aka shared library). The compilation is
         | faster than gcc and the code size is typically smaller too:
         | $ cat hello.scm       (display "hello!\n")       $ gsc
         | hello.scm       $ gsi hello.o1       hello!       $ ls -l
         | hello.o1   # this is generated by gcc       -rwxrwxr-x 1 feeley
         | feeley 18152 Mar 13 17:16 hello.o1       $ rm hello.o1       $
         | gsc -cc "tcc -shared" hello.scm       $ gsi hello.o1
         | hello!       $ ls -l hello.o1   # this is generated by tcc
         | -rwxrwxr-x 1 feeley feeley 4432 Mar 13 17:17 hello.o1
        
       | fernly wrote:
       | Um, excuse me, but there existed a Tiny-C in 1979. Whatever you
       | are talking about creating in 2000 is in no way an original idea.
       | 
       | References:
       | 
       | Dr. Dobb's Journal #32 (Feb 1979) page 41, review of Tiny-C User
       | Manual by Ted Shapin [0]
       | 
       | Dr. Dobb's Journal #35 (May 1979) page 37, "Tiny-C Interpreter on
       | C-Dos" by Ray Duncan[1]
       | 
       | Tiny-C Associates incorporated in Holmdel, NJ, March 1978 [2]
       | 
       | "Tiny C" trademark application filed 1979, cancelled 1987 [3]
       | 
       | There was also a "Small C", see DDJ #69 (July 1982) p. 66, "Small
       | C for the 9900" by Matthew Halfant[4]
       | 
       | [0]
       | https://archive.org/details/dr_dobbs_journal_vol_04_201803/p...
       | 
       | [1]
       | https://archive.org/details/dr_dobbs_journal_vol_04_201803/p...
       | 
       | [2] https://www.bizapedia.com/nj/tiny-c-associates.html
       | 
       | [3] https://alter.com/trademarks/tiny-c-73219160
       | 
       | [4]
       | https://archive.org/details/dr_dobbs_journal_vol_07_201803/p...
        
       | mati365 wrote:
       | Recently I'm working on toy C compiler and x86 Assembler in
       | TypeScript[1] and I can confirm that the amount of work that have
       | to be done to compile and print simple Hello World is
       | astronomically huge (as the satisfaction)
       | 
       | [1] https://github.com/Mati365/ts-c-compiler
        
         | Narishma wrote:
         | This isn't a C compiler though. It's a compiler for a language
         | called Tiny-C.
        
           | [deleted]
        
       | jokoon wrote:
       | first assignment would be to add the multiply and divide
       | operators...
       | 
       | I admit I have trouble understanding how the VM run() function
       | works... anybody can give some insight?
        
         | mav88 wrote:
         | The function runs through the program by incrementing the
         | program counter (*pc++) and dispatching what instruction it
         | sees. It's a stack-based VM so individual instructions are
         | pushed onto and popped from the stack depending on the
         | operation. Is there anything specific you don't grok? Happy to
         | help.
        
       | feeley wrote:
       | Author here. Just for context tinyc.c was created in 2000 (I
       | found the file in my archives and the last modification date is
       | January 12, 2001). I was not aware at the time of Fabrice
       | Bellard's work which after all won the IOCCC in 2001, so the
       | confusion with TCC was not intentional. My tinyc.c was meant to
       | teach the basics of compilers in a relatively accessible way,
       | from parsing to AST to code generation to bytecode interpreter.
       | And yes it is the subset of C that is tiny, not a tiny compiler
       | for the full C language.
        
         | bullen wrote:
         | I wish I had time to make a list what would be required to
         | bootstrap this.
         | 
         | Either by adding complexity (more features to the compiler) or
         | dropping complexity (fewer C features in the implementation).
         | 
         | Did you ever look at that?
         | 
         | Edit: functions, enum, struct, arrays and maybe make all
         | variables/functions a-z?
         | 
         | Edit2: https://joyofsource.com/projects/bootstrappable-tcc.html
        
       | userbinator wrote:
       | It's unfortunately not self-compiling, but has a structure which
       | is very reminiscent of C4 --- another tiny C-subset compiler +
       | stack-based VM which is self-compiling:
       | 
       | https://news.ycombinator.com/item?id=8558822
       | 
       | The 26 predefined integer variables make this look like a variant
       | of minimal BASIC, except with structured control flow instead of
       | only GOTO.
        
       | bakul wrote:
       | This doesn't have types, functions, arrays or much error
       | checking. It has one char identifiers. I don't think we should
       | read into this any more than a tiny example or experiment by the
       | author.
        
         | Gordonjcp wrote:
         | So, it's the C equivalent of Tiny BASIC?
         | 
         | So, a Tiny C?
        
           | bakul wrote:
           | Not even that as it doesn't have function calls or even
           | print!
           | 
           | See Feeley's response for the proper context.
        
       | netgusto wrote:
       | It's worth noting that this is a compiler for the Tiny-C
       | language, and not as one might think a tiny compiler for the C
       | language.
        
         | susam wrote:
         | Yes, a better title would be:
         | 
         | Compiler for the Tiny-C Language (2001)
         | 
         | In fact, that is exactly how the source code describes itself
         | in the comments.
        
         | unwind wrote:
         | It's probably better to call it an interpreter, since it will
         | also run the program and print the values of all non-zero
         | variables afterward.
         | 
         | Calling it a compiler is (to me) really stretching things, I
         | can't see any code to emit any other form of the code, it's all
         | aimed at evaluating (executing) it.
         | 
         | Edit: oops, I didn't read the code closely enough, it does emit
         | code but only internally, that code is what gets executed.
         | Thanks for the corrections!
        
           | northernskys30 wrote:
           | It compiles to a sort of byte code that is executed by a
           | stack based virtual machine.
        
           | userbinator wrote:
           | It is a compiler rather than a direct evaluator, since it
           | generates bytecode for a stack VM --- and also includes the
           | interpreter for that (look at the bottom).
        
             | masklinn wrote:
             | That's more or less every interpreter. CPython compiles to
             | bytecode before interpreting that, yet nobody would call it
             | a compiler.
        
               | Mike_12345 wrote:
               | That is definitely a compiler and anyone with a CS degree
               | would call it that if they were discussing its
               | functionality, because that's technically what it is.
               | (Referring specifically to the part which compiles Python
               | to bytecode)
               | 
               | Your SQL database also has a compiler. SQL is compiled to
               | an execution plan. Compile doesn't only mean "create a
               | machine code executable file".
        
               | masklinn wrote:
               | > That is definitely a compiler and anyone with a CS
               | degree would call it that if they were discussing its
               | functionality because that's technically what it is.
               | 
               | None of these assertions is correct.
               | 
               | > (Referring specifically to the part which compiles
               | Python to bytecode)
               | 
               | So referring specifically to something different than
               | what I explicitly specified, it's called something else.
               | 
               | By that reasoning, a cow is a muscle and you are an acid.
               | 
               | > Your SQL database also has a compiler.
               | 
               | "Has a" and "is a" are rather different relationships.
               | 
               | > Compile doesn't only mean "create a machine code
               | executable file".
               | 
               | You're the only person who made that assertion.
        
               | shadowfox wrote:
               | In contrast, Java also did that and I doubt if most
               | people think of Java as interpreted. So, using a byte-
               | code interpreter may not be the criteria most people are
               | using to decide on this. Truthfully, I think it is all a
               | bit arbitrary.
        
       | [deleted]
        
       | zabzonk wrote:
       | not sure i understand how enums work here. but interesting.
        
       ___________________________________________________________________
       (page generated 2023-03-13 23:01 UTC)