Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Couldn’t a tcc or similarly simple C compiler be used instead of a 100MB Clang? Where’s the C to wasm compiler hiding?


One issue with Wasm is you essentially can't target it with a single-pass compiler, unlike just about any real machine. Wasm can only represent reducible control flow, so you have to pass your control-flow graph through some variation of the Relooper[1,2]. I don't know if upstream tcc can do that (there are apparently some forks?..).

[1] http://troubles.md/why-do-we-need-the-relooper-algorithm-aga...

[2] https://medium.com/leaningtech/solving-the-structured-contro...


> you essentially can't target it with a single-pass compiler,

That might be true if your source language has goto, but for other languages that start with structured control flow, it's possible to just carry the structure through and emit Wasm directly from the AST.


Sure, I was speaking in the context of C specifically. (In non-simplistic compilers, you may not want to preserve the source structure anyway—e.g. in Scheme or Lua with tail calls all over the place.)


Presumably C's `switch` is also a problem.


Yes, I don't recall all the confusing elements and technicalities of what's allowed in Switch statements in C offhand but here are a few brainfscks:

https://old.reddit.com/r/C_Programming/comments/16kg48y/mind...

https://old.reddit.com/r/programminghorror/comments/ylc7f3/w...


I went down a rabbithole and wow.

Found a comment from the author of https://github.com/stclib/STC apparently and then came across this example:

https://stackoverflow.com/a/76887723

  int coro_a(struct a* g)
  {
   cco_routine (g) {
    printf("entering a\n");
    for (g->i = 0; g->i < 3; g->i++) {
     printf("A step %d\n", g->i);
     cco_yield();
    }
    cco_final:
    printf("returning from a\n");
   }
   return 0; // done
  }
gcc -E -ISTC/include co.c

After running it through a preprocessor, it gives me this.

  int coro_a(struct a* g)
   {
    for (int* _state = &(g)->cco_state; *_state != CCO_STATE_DONE; *_state = CCO_STATE_DONE) _resume: switch (*_state) case 0: {
     printf("entering a\n");
     for (g->i = 0; g->i < 3; g->i++) {
      printf("A step %d\n", g->i);
      do { *_state = 14; return CCO_YIELD; goto _resume; case 14:; } while (0);
     }
     *_state = CCO_STATE_FINAL; case CCO_STATE_FINAL:
     printf("returning from a\n");
    }
    return 0;
   }


I don’t want to become the switch-statement guy, but neither can I resist, apparently. There are no technicalities in what is allowed in a switch statement: the same things are as with bare gotos. That is, a switch statement is a fancy goto, and case labels are just labels that look a bit funny. Except for the case labels being restricted to inside of the switch body, nesting doesn’t really come into it.

So then the question becomes, which things are you allowed to jump over? In C++, I don’t really know, the restrictions seem fairly stringent. In C, you can jump over anything except a declaration using a variably modified type (i.e. a variable-length array, a pointer to one, etc.), but keep in mind that the variables whose declarations you’ve jumped over will be uninitialized even if the declaration does have an initializer.


This is true. In Theta (https://github.com/ThetaLang/Theta) this is exactly what we do -- no need for more than one pass for the WASM codegen.


If all you want to do is compile and run c code in the browser you could run tcc in the blink x86_64 emulator, running in wasm. It would take ~300Kb, less than the js & css used in the average webpage


The whole LLVM toolchain is a bit big. I think we can reduce much more the size. We actually researched on using tcc but unfortunately tcc doesn’t have a wasm backend (for generating wasm output). It would be awesome if they added it!


Check out https://github.com/tyfkda/xcc, I've only used the native backend, but it's small and fast.


Nice! I didn’t know the project. Thanks for sharing!


This project is also very much worth checking out.

https://cranelift.dev/

From the page:

Cranelift is a fast, secure, relatively simple and innovative compiler backend. It takes an intermediate representation of a program generated by some frontend and compiles it to executable machine code. Cranelift is meant to be used as a library within an "embedder".

It is in successful use by the Wasmtime WebAssembly virtual machine, for just-in-time (JIT) and ahead-of-time (AOT) compilation, and also as an experimental backend for the Rust compiler.

Cranelift is an optimizing compiler, but it aims to take a fresh look at which optimizations are necessary. We have explicitly avoided features -- such as advanced alias analysis or use of undefined behavior -- that have historically led to subtle miscompilations in other compilers. Cranelift consists of about 200 thousand lines of code; in contrast, e.g. LLVM consists of over 20 million lines of code, a hundred times larger. This difference also allows Cranelift to be relatively approachable to developers, researchers, auditors and others who wish to understand how it works.


I recently wanted to use tcc for a homebaked programming sideproject and was surprised to find it's no longer supported anymore, at least not by Fabrice Bellard. Upstream git still has some light activity but no releases. I wasn't sure how good of an idea it is to rely on it as a code generator.


It's alive and kicking my friend https://repo.or.cz/tinycc.git/shortlog

We wait for grischka to decide when to announce a new release https://lists.nongnu.org/archive/html/tinycc-devel/2024-10/m...


I see thanks, that's great.


clang can target wasm already.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: