Skip to content(if available)orjump to list(if available)

I Wrote a WebAssembly VM in C

I Wrote a WebAssembly VM in C

31 comments

·February 3, 2025

syrusakbary

This is an interesting approach, great work!

For anyone that wants to check where the meat is at, is mostly in this file: https://github.com/irrio/semblance/blob/main/src/wrun.c

Thinking out loud, I think it would have been a great idea to conform with the Wasm-C-API (https://github.com/WebAssembly/wasm-c-api) as a standard interface for the project (which most of the Wasm runtimes: Wasmer, V8, wasmi, etc. have adopted), as the API is already in C and it would make it easier to try for developers familiar with that API.

Note for the author: if you feel familiar enough with Wasm and you would like to contribute into Wasmer, we would also welcome any patches or improvements. Keep up the work work!

oguz-ismail

> Wasmer

> Installed-Size: 266 MB

What the hell

syrusakbary

Indeed, we need to improve further the base binary size!

Most of the size comes from the LLVM backend, which is a bit heavy. Wasmer ships many backends by default, and if you were to use Wasmer headless that would be just a bit less than a Mb.

If you want, you can always customize the build with only the backends that you are interested in using.

Note: I've seen some builds of LLVM under 5-10Mb, but those require heavy customization. Is clear that we have still some work to do to reduce size on the general build!

null

[deleted]

davexunit

This was a fun read! I wrote a Wasm interpreter in Scheme awhile back so it makes me happy to see more people writing their own. It is less difficult than you might think. I encourage others to give the spec a look and give it a try. No need to implement every instruction, just enough to have fun.

whizzter

One tip for the author from another one, the spec-test contains various weird forms of textual wasm that isn't obvious how to compile but the wast2json converter can produce a simpler JSON desc accompanies by regular binary wasm files.

bhelx

Same tip here. We did this with Chicory: https://github.com/dylibso/chicory

I'd follow on that, the earlier you can get this test-suite running the better for the iteration speed and correctness of your project.

It took a bit of time to make everything work, but once we did, we very quickly got to the point of running anything. The test-suite is certainly incomplete but gets you 95% there: https://github.com/WebAssembly/testsuite

pcmoore

I found this article very interesting with regards direct WASM interpretation: https://arxiv.org/abs/2205.01183

I produced https://github.com/peterseymour/winter on the back of it and learnt WASM is not as simple as it should be.

deivid

This is a really nice write up! It's giving me motivation to go back to my WASM implementation

autumnlani

This is awesome. Nicely done

greasy

This is awesome.

pdubroy

This is great! The WebAssembly Core Specification is actually quite readable, although some of the language can be a bit intimidating if you're not used to reading programming language papers.

If anyone is looking for a slightly more accessible way to learn WebAssembly, you might enjoy WebAssembly from the Ground Up: https://wasmgroundup.com

(Disclaimer: I'm one of the authors)

amw-zero

I think it's much better to just learn how to read inference rules. They're actually quite simple, and are used ubiquitously to define PL semantics definitions.

Constraining this on "that's not an option" is a big waste of time - learning this will open up all of the literature written on the subject.

shpongled

The WASM spec is so well defined presumably because Andreas Rossberg is the editor - and he did a bunch of PL research on extensions to Standard ML, which is famous for it's specification!

veltas

> actually quite readable, although some of the language can be a bit intimidating if you're not used to reading programming language papers

You're more generous than me, I think it's rubbish.

Would have been easier to read if they had written it more like an ISA manual.

mananaysiempre

You can understand the WASM spec in your sleep if you’ve ever worked through a type-system paper from the last two decades (or a logic paper from even earlier I guess).

Granted, not many people have, but there’s a reason why it makes sense for it to be written in that style: they want it to be very clear that the verification (typechecking, really) algorithm doesn’t have any holes, and for that it’s reasonable to speak the language of the people who prove that type of thing for a living.

The WASM spec is also the ultimate authoritative reference for both programmers and implementers. That’s different from the goals of an ISA manual, which usually only targets programmers and just says “don’t do that” for certain dark corners of the (sole) implementation. (The RISC-V manual is atypical in this respect; still, I challenge you to describe e.g. which PC value the handler will see if the user code traps on a base RV32IMA system.)

amw-zero

This is an opportunity to learn. The way WebAssembly is defined is the standard way PL semantics are defined.

MuffinFlavored

I know one of WebAssembly's biggest features by design is security / "sandbox".

But I've always gotten confused with... it is secure because by default it can't do much.

I don't quite understand how to view WebAssembly. You write in one language, it compiles things like basic math (nothing with network or filesystem) to another and it runs in an interpreter.

I feel like I have a severe lack/misunderstanding. There's a ton of hype for years, lots of investment... but it isn't like any case where you want to add Lua to an app you can add WebAssembly/vice versa?

jeroenhd

WebAssembly can communicate through buffers. WebAssembly can also import foreign functions (Javascript functions in the browser).

You can get output by reading the buffer at the end of execution/when receiving callbacks. So, for instance, you pass a few frames worth of buffers to WASM, WASM renders pixels into the buffers, calls a callback, and the Javascript reads data from the buffer (sending it to a <canvas> or similar).

The benefit of WASM is that it can't be very malicious by itself. It requires the runtime to provide it with exported functions and callbacks to do any file I/O, network I/O, or spawning new tasks. Lua and similar tools can go deep into the runtime they exist in, altering system state and messing with system memory if they want to, while WASM can only interact with the specific API surface you provide it.

That makes WASM less powerful, but more predictable, and in my opinion better for building integrations with as there is no risk of internal APIs being accessed (that you will be blamed for if they break in an update).

brabel

> Lua and similar tools can go deep into the runtime they exist in, altering system state and messing with system memory if they want to

That's not correct, when you embed Lua you can choose which APIs are available, to make the full stdlib available you must explicitly call `luaL_openlibs` [1].

[1] https://www.lua.org/manual/5.3/manual.html#luaL_openlibs

panic

I don’t believe it is currently possible for a WebAssembly instance to access any buffer other than its own memory. You have to copy data in and out.

Karellen

> You write in one language

Not quite. Web assembly isn't a source language, it's a compiler target. So you should be able to write in C, Rust, Fortran, or Lua and compile any of those to WebAssembly.

Except that WebAssembly is a cross-platform assembly language/machine code which is very similar to the native machine code of many/most contemporary CPUs. This means a WebAssembly interpreter can be very straightforward, and could often translate one WebAssembly instruction to one native CPU instruction. Or rather, it can compile a stream of WebAssembly instructions almost one-to-one to native CPU instructions, which it can then execute directly.

whizzter

A JIT should be able to translate most arithmetic and binary instructions to single-opcodes, however anything involving memory and functions calls needs safety checks that becomes multi-instruction. branches could mostly be direct _unless_ the runtime has any kind of metering (it should) to stop eternal loops (if it also wants to be crash-safe even if it's exploit safe).

beardyw

Yes, interpretation on the fly was never its intention. The intention was to provide interpreted languages with a way to implement fast compiled functions.

coliveira

I think the biggest advantage of wasm in terms of security is that it doesn't accept machine language written in the target machine, only in this artificial machine language. This means that it cannot encode arbitrary code that could be executed by the host machine. Everything it runs has necessarily to go through the wasm interpreter.

pdubroy

You should check out the book :-)

We have a chapter called "What Makes WebAssembly Safe?" which covers the details. You can get a sneak peek here: https://bsky.app/profile/wasmgroundup.com/post/3lh2e4eiwnm2p

null

[deleted]