Tiny C Compiler

(bellard.org)

152 points | by guerrilla 12 hours ago ago

69 comments

  • fjfaase 10 hours ago ago

    The code of TCC (0.9.26) is kind hard to compile, I have discovered in the past year, while developing a minimal C compiler to compile the TCC sources [1]. For that reason, I have concluded that TCC is its own test set. It uses the constant 0x80000000, which is an edge case for if you want to print it as a signed integer only using 32-bit operators. There is a switch statement with an post-increment operator in the switch expression. There are also switch statements with fall throughs and with goto statements in the cases. It uses the ## operator where the result is the name of a macro. Just to name a few.

    [1] https://github.com/FransFaase/MES-replacement

    • ForOldHack 8 hours ago ago

      You have simply made one tiny step, that the guys who used AI and $25,000 to write a C compiler in Rust, could not make:

      You are using the compiler to compile itself.

      "TCC is its own test set." Absolutely brilliant.

      • eichin 8 hours ago ago

        Back in the 90s gcc did a three-stage build to isolate the result from weakness in the vendor native compiler (so, vendor builds gcc0, gcc0 builds gcc1, gcc1 builds gcc2 - and you compare gcc2 to gcc1 to look for problems.) It was popularly considered a "self test suite" until someone did some actual profiling and concluded that gcc only needed about 20% of gcc to compile itself :-)

    • eulgro 7 hours ago ago

      To be honest, these all seem like pretty basic features.

      Goto is easier to implement than an if statement. Postincrement behaves no differently in a switch statement than elsewhere.

  • ZoomZoomZoom 8 hours ago ago

    One of the coolest tricks is using tcc to compile "on demand." This allows you to use a compiled language like Nim for scripting, with almost no noticeable performance difference compared to interpreted languages.

      #!/usr/bin/env -S nim r --cc:tcc -d:useMalloc --verbosity:0 --hints:off --tlsEmulation:on --passL:-lm
      echo "Hello from Nim via TCC!"
    
    Here's a comparison (bash script at [1]) of a minimal binary compiled this way with different interpreters. First line is the noise. Measured by tim[2] written by @cb321.

      1.151 +- 0.028 ms     (AlreadySubtracted)Overhead
      1.219 +- 0.037 ms     bash -c exit
      2.498 +- 0.040 ms     fish --no-config --private -c exit
      1.682 +- 0.058 ms     perl   -e 'exit 0'
      1.621 +- 0.043 ms     gawk   'BEGIN{exit 0}'
      15.8 +- 2.2 ms     python3 -c 'exit(0)'
      20.0 +- 5.7 ms     node   -e 'process.exit(0)'
      -2.384 +- 0.041 ms tcc -run x.c
      153.2 +- 4.6 ms     nim r --cc:tcc  x.nim
      164.5 +- 1.2 ms     nim r --cc:tcc -d:release x.nim
    
    Measured on a laptop without any care to clean the environment, except turning the performance governor. Even with `-d:release` compiling nim code is comparable.

    The fact that tcc compilation cycle measures negative here is a nice punchline.

    [1]: https://gist.github.com/ZoomRmc/58743a34d3bb222aa5ec02a5e2b6...

    [2]: https://github.com/c-blake/bu/blob/main/doc/tim.md

    • spijdar 7 hours ago ago

      It's worth pointing out that Nim is going to cache all of the compilation up to the linking step. If you want to include the full compilation time, you'd need to add --forceBuild to the Nim compiler.

      (Since a lot of the stuff you'd use this for doesn't change often, I don't think this invalidates the "point", since it makes "nim r" run very quickly, but still)

      There's also the Nim interpreter built into the compiler, "NimScript", which can be invoked like:

        #!/usr/bin/env -S nim e --hints:off
        echo "Hello from Nim!"
      
      The cool thing is that, without --forceBuild, Nim + TCC (as a linker) has a faster startup time than NimScript. But if you include compile time, NimScript wins.
      • ZoomZoomZoom 7 hours ago ago

        Yep, always forget about '--forceBuild'. You can see in the script above the nimcache directory was overriden to tmpfs for the measurement, though. Caching will be helpful in real usecases, of course.

        Nimscript is cool but very limited, not being able to use parts of the stdlib taht depend on C. Hope this will change with Nimony/Nim 3.

  • gnufx 11 hours ago ago
    • guerrilla 10 hours ago ago

      Riiiight, I forgot about htat.

  • veltas 10 hours ago ago

    The unofficial repo continuing tcc has geoblocked the UK.

    https://repo.or.cz/tinycc.git

    • jeremyjh 9 hours ago ago

      This is the most rational response to poorly written laws. If everyone did it, maybe they would repeal that law.

      https://repo.or.cz/uk-blocked.html

      • problynought 9 hours ago ago

        The most rational response to poorly written laws is collective action against government that wrote them.

        But that would require terminally online frogs acting in their collective interests, not isolating at home hoping the heat never reaches them.

        • zbentley 8 hours ago ago

          The copy on the linked "UK geoblocking" page doesn't contradict that, though.

          The authors say, basically, that there's a risk of prosecution in the UK that would financially devastate anyone that works on the project, and that the act of determining how to comply with UK laws is itself an extremely resource-intensive legal task that they can't or won't do. In other words, they're geoblocking the UK not out of activism but out of pragmatic self-preservation.

          That's not in any way mutually exclusive with collective action.

          ...also, couldn't deciding to geoblock the UK be a form of collective action? If that's what you originally meant, I sincerely apologize for reading it backwards.

        • nine_k 5 hours ago ago

          If you're not a citizen, maybe you don't get to take part in the collective action to repeal a law, at least not as easily.

        • jeremyjh 7 hours ago ago

          Is everyone blocking them not collective action?

  • haunter 12 hours ago ago

    There is an actively maintained fork with RISC-V support and such

    https://repo.or.cz/w/tinycc.git

    https://github.com/TinyCC/tinycc

    • csb6 11 hours ago ago

      I've never seen another repo with public commit access like that. I guess the project is niche enough that you don't get spammed with bad or malicious commits.

      • haunter 11 hours ago ago

        Yeah it's basically anarchy (to some extent)

        https://repo.or.cz/h/mob.html

        >The idea is to provide unmoderated side channel for random contributors to work on a project, with similar rationale as e.g. Wikipedia - that given enough interested people, the quality will grow rapidly and occassional "vandalism" will get fixed quickly. Of course this may not work nearly as well for software, but here we are, to give it a try.

      • riffraff 11 hours ago ago

        When pugs (a perl6 implementation in Haskell) was a thing, you gained commit access by asking and it was immediately granted to everyone. It was insane and awesome.

        • nurettin 3 hours ago ago

          This has been my experience in the early 2000 with sourceforge. You went to the related irc channel, introduced yourself, asked for access and they would add you to the project. You could work on a game that you liked, a jabber client, and even code::blocks at some point. Boost (c++ libraries ) was more serious, you'd have to create the implementation and documentation according to their format and post it to the forum, then they would ask you to defend certain parts or reject due to bloat/DRY/unnecessary.

          Everything felt more like a community effort back then.

    • veltas 10 hours ago ago

      I would be interested in contributing to this but the UK is geoblocked.

      • LarsKrimi 8 hours ago ago

        Are you sure you are geoblocked, and that's it's just not the updated SSH host key change from 2022?

        Actual, geoblocks can be confounding of course. After brexit I've personally thought of blocking UK phone numbers from calling me though... So could just as well be intentional

    • einpoklum 11 hours ago ago

      It is also interesting to note that while the repository is quite active, there has not been any release for _8 years_, and the website is the same one at the top of this conversation, i.e. the one where the old maintainer says he quit and the benchmarks are from 20 years ago.

      A small and minimalistic C compiler is actually a very important foundational project for the software world IMNSHO.

      I'm definitely reminded of: https://xkcd.com/2347/

  • kristianp 11 hours ago ago

    Does anyone use libtcc for a scripting language backend? Smaller and faster than llvm. You'd have to transpile to a C ast I imagine.

    • kgeist 11 hours ago ago

      Years ago I built a scripting language that transpiled to TCC and then compiled to machine code in memory. It produced human-readable C code so it was very easy to get going: when debugging the compiler I could just look at the generated C code without having to learn any special infrastructure/ecosystem/syntax etc. Plus basically zero-overhead interop with C out of the box => immediate access to a lot of existing libraries (although a few differences in calling conventions between TCC and GCC did bite me once). Another feature I had was "inline C" if you wanted to go low level, it was super trivial to add, too. It was pretty fast, maybe two times slower than GCC, IIRC, but more than enough for a scripting language.

    • olivia-banks 11 hours ago ago

      libtcc doesn't give you much control AST wise, you basically just feed it strings. I'm using it for the purpose you mentioned though--scripting language backend--since for my current "scripting-language" project I can emit C89, and it's plenty fast enough for a REPL!

          /* add a file (either a C file, dll, an object, a library or an ld script). Return -1 if error. */
          int tcc_add_file(TCCState *s, const char *filename);
      
          /* compile a string containing a C source. Return non zero if error. */
          int tcc_compile_string(TCCState *s, const char *buf);
  • imwally 9 hours ago ago

    Anyone know a good resource for getting started writing a compiler? I'm not trying to write a new LLVM, but being a "software engineer" writing web-based APIs for a living is leaving me wanting more.

  • senfiaj 10 hours ago ago

    There is even smaller C compiler that fits within the 512 bytes https://xorvoid.com/sectorc.html

  • asdefghyk 10 hours ago ago

    I recall, there where similar items back in late 70s and early 80s .

    Tiny C, Small C are names I seem to recall, buts its very fuzzy - Not sure if they were compilers, may have been interpreters....

    • akritid 7 hours ago ago

      You probably remember cint

  • Dwedit 7 hours ago ago

    What's the quality of the generated code like? Does it use explicit stack frames and all local variables live there? Does it move loop-invariant operations out of a loop? Does it store variables in registers?

  • Jotalea 8 hours ago ago

    TCC is my go-to for keeping builds lean. on windows specifically, you get a functional C compiler in a few hundred KB, whereas the standard alternatives require gigabytes of disk space (that I don't have to spare) and complex environment setups

  • throwatdem12311 11 hours ago ago

    This was the compiler I was required to use for my courses in university. GCC was forbidden. The professor just really liked tcc for some reason.

    • II2II 10 hours ago ago

      > The professor just really liked tcc for some reason.

      Perhaps, or maybe they just got tired of students coming in and claiming that their program worked perfectly on such-and-such compiler.[1] It looks like tcc would run on most systems from the time of its introduction, and perhaps some that are a great deal older. When I took a few computer science courses, they were much more restrictive. All code had to be compiled with a particular compiler on their computers, and tested on their computers. They said it was to prevent cheating but, given how trivial it would have been to cheat with their setup, I suspect it had more to do with shutting down arguments with students who came in to argue over grades.

      [1] I was a TA in the physical sciences for a few years. Some students would try to argue anything for a grade, and would persist if you let them.

      • dymk 9 hours ago ago

        The prof could have just said "Use GCC <version>" then, which would run on even more systems than TCC. Professor probably just really liked TCC.

    • mort96 11 hours ago ago

      Seems like a good way to get students to write C rather than GNU C.

      • uecker 11 hours ago ago

        TCC - just like many other C compilers - supports many GNU extensions.

      • einpoklum 11 hours ago ago

        The professor could have just insisted on `-std=c99` or a similar GCC flag which disallows GNU extensions.

        When I taught programming (I started teaching 22 years ago), the course was still having students either use GCC with their university shell accounts, or if they were Windows people, they would use Borland C++ we could provide under some kind of fair use arrangement IIANM, and that worked within a command shell on Windows.

        • actionfromafar 11 hours ago ago

          On the other hand, with tcc, you'd know exactly what you were dealing with.

          I used it just the other day to do some tests. No dependencies, no fiddling around with libwhater-1.0.dll or stuff like that when on Windows and so on.

  • deivid 11 hours ago ago

    TCC is fantastic! Very hackable, easy to compile to WASM for some interesting in-browser compilation

    • yjftsjthsd-h 9 hours ago ago

      I thought it only targeted x86? What's the point running in a browser?

  • pixelsort 9 hours ago ago

    Currently striving towards my own TypeScript to native x86_64 physical compiler quine bootstrapped off of TCC and QuickJS. Bytecode and AST are there!

    • jrop 9 hours ago ago

      This sounds like a really cool project. What challenges have you encountered so far?

      • pixelsort 9 hours ago ago

        Thanks. The hardest part has been slogging through the segfaults and documenting all the unprincipled things I've had to add. Post-bootstrap, I have to undo it all because my IR is a semantically rich JSON format that is turing-incomplete by design. I'm building a substrate for rich applications over bounded computation, like eBPF but for applications and inference.

  • rustyhancock 12 hours ago ago

    What a blast from the past TCC!

    Sad but not surprised to see it's no longer maintained (8 years ago!).

    Even in the era of terabyte NVMe drives my eyes water when I install MSVC (and that's usually just for the linker!)

    • antirez 12 hours ago ago

      That is, I believe, one the points of AI and Open Source many contacts. Something like TCC, with a good coding agent and a developer that cares about the project, and knows enough about it, can turn into a project that can be maintained without the otherwise large efforts needed, that resulted into the project being abandoned. I'm resurrecting many projects of mine I had no longer the time to handle, like dump1090, linenoise, ...

    • shakna 11 hours ago ago

      Still maintained. You have the mob repo in another comment.

      Debian, Fedora, Arch and others pull their package from the mob repo. They're pretty good at pulling in CVE fixes almost immediately.

      Thomas Preud'homme is the new maintainer lead, though the code is a mob approach.

    • pkal 12 hours ago ago

      I don't think it is not maintained, there is plenty of activity going on in the repo: https://repo.or.cz/tinycc.git, they just don't seem to be cutting releases?

    • kristianp 12 hours ago ago

      There's still activity on the mailing list. It may still be maintained.

      https://lists.nongnu.org/archive/html/tinycc-devel/2026-02/t...

  • pbohun 11 hours ago ago

    There also is an unofficial mirror which has updates.

    https://github.com/TinyCC/tinycc

    • _kst_ 9 hours ago ago

      That has the same content as git://repo.or.cz/tinycc.git

  • olivia-banks 11 hours ago ago

    TCC is fantastic! I use it a lot to do fast native-code generation for language projects, and it works really really well.

  • markus_zhang 11 hours ago ago

    I mixed it up with LCC which was used in Quake 3. Still this is pretty cool.

  • RobotToaster 9 hours ago ago

    What advantage does this have over SDCC?

  • 1vuio0pswjnm7 6 hours ago ago

    Does it work on NetBSD yet

  • kimixa 11 hours ago ago

    Man I can't wait for tcc to be reposted for the 4th time this week with the license scrubbed and the comment of "The Latest AI just zero-shotted an entire C compiler in 5 minutes!"

    • overgard 11 hours ago ago

      And the subsequent youtube hype videos of "COMPILER WRITING IS OVER!"

    • resonious 11 hours ago ago

      There actually was an article like this from Anthropic the other day but instead of 5 minutes I think it was weeks and $20,000 worth of tokens. Don't have the link handy though.

      • Barbing 9 hours ago ago

        Sixteen Claude AI agents working together created a new C compiler - Ars Technica

        https://arstechnica.com/ai/2026/02/sixteen-claude-ai-agents-...

        > The $20,000 experiment compiled a Linux kernel but needed deep human management.

        We tasked Opus 4.6 using agent teams to build a C Compiler | Hacker News

        https://news.ycombinator.com/item?id=46903616

        • logicprog 9 hours ago ago

          Quoting my sibling comment:

          Except it was written in a completely different language (Rust), which likely would have necessitated a completely different architecture, and nobody has established any relationship either algorithmically or on any other level between that compiler and TCC. Additionally, and Anthropic's compiler supports x86_64 (partially), ARM, and RISC-V, whereas TCC supports x86, x86_64, and ARM. Additionally, TCC is only known to be able to boot a modified version of the Linux 2.4 kernel[1] instead of an unmodified version of Linux 6.9.

          Additionally, it is extremely unlikely for a model to be able to regurgitate this many tokens of something, especially translated into another language, especially without being prompted with the starting set of tokens in order to specifically direct it to do that regurgitation.

          So, whatever you want to say about the general idea that all model output is plagiarism of patterns it's already seen or something. It seems pretty clear to me that this does not fit the hyperbolic description put forward in the parent comments.

          [1]: https://www.bellard.org/tcc/tccboot.html

      • logicprog 10 hours ago ago

        Except it was written in a completely different language (Rust), which would have necessitated a completely different architecture, and nobody has established any relationship either algorithmically or on any other level between that compiler and TCC.

    • rowanG077 10 hours ago ago

      I may have missed this. Do you have a link when AI verbatim copied tcc and it was publicized? I have my doubts.

    • logicprog 10 hours ago ago

      I don't understand what you could possibly be talking about. Do you care to elaborate?