Making Libcurl Work in WebAssembly

vk6

I did a similar project recently, although it was more focused on getting a good Javascript API out of libcurl, rather than integrating with a different language like R: https://github.com/ading2210/libcurl.js

My first approach for networking was also to use SOCK5 through a Websocket. However, this turns out to be really slow. Each new connection created by emscripten requires waiting for: the TLS handshake from the browser to your proxy, the Websocket handshake which takes place over HTTP/1.1, the SOCK5 handshake on the Websocket, and the TLS handshake from libcurl to the destination server.

That's many many round trips required just for a single request! In practice, if the proxy server isn't physically close to you, the latency can be multiple seconds. This is partially mitigated by the fact that libcurl can use HTTP/2 to reuse that socket, but if you're placing requests to different hosts, or those that don't support HTTP/2, this is a huge problem.

The solution is to make it so that multiple TCP sockets can share the same Websocket, and then minimize round trips in the proxy protocol. I wrote a new protocol for this purpose here: https://github.com/MercuryWorkshop/wisp-protocol

It basically acts like multiplexed SOCKS5 over a Websocket. One trick that it uses to reduce latency further is for the client to simply assume creating a new socket succeeded, and to start immediately sending data, which eliminates another round trip. So apart from the very first connection which establishes the Websocket, there is zero added latency for new sockets.

Actually getting Emscripten to use this is slightly cursed and you need to patch the generated JavaScript using some Regex. I could probably get this upstreamed in emscripten someday through.

Also, it turns out that when writing this sort of network proxy, it doesn't really matter what language you use. The bottleneck ends up being the Linux TCP stack. You might think that a hyper optimized Rust or Go based Websocket proxy would be faster, but I found that the Wisp proxy server I wrote in Python was on par with the one written in Rust during synthetic tests. Even the slowest implementations get upwards of 2 gbit/s of throughput (on slow CPUs) which can saturate the NICs of almost all VPS providers.

kamranjon

Sorry if this is obvious, but I read the article and am still a bit unsure. If you use libcurl on the front end to download a file using this method - where does the file end up? Is it in the browsers memory? Is it piped through websockets to some backend service? Is it written to local disk using the newish file system API?

NoThisIsMe

From the article

> What this code does is read an index file that contains the list of R packages from CRAN, and subsequently download the description files of the first 200 packages to the user home directory (which is actually a virtual filesystem in WebR [1]). > [1] https://docs.r-wasm.org/webr/latest/mounting.html

So I think it's a virtual filesystem in browser memory.

oso2k

You might use a data URL to allow the file to be downloaded. Gemini gave me a recommendation on how to do this with this query.

https://www.google.com/search?q=use+data+url+to+download+fil...

therein

> use data url to download file

It is kinda funny and kinda sad that you thought this was worth sharing.

Gemini says use google with this query. Really? Wow. Revolutionary. What did we do before LLMs?

immibis

Why do you need libcurl to work in WebAssembly... when you're already running in a browser?

(The answer: to run third-party code that uses libcurl because it isn't designed to run in web browsers)

RandomRandy

One advantage over using fetch is that the WebAssembly approach seems to bypass CORS

> If you inspect the devtools network tab of your browser, you see that everything happens over a single WebSocket to wss://ws.r-universe.dev. The browser is not making the HTTP requests, in fact this would not even be possible because we download the files from a host that does not enable CORS.

roywiggins

You don't need websockets or wasm for that of course:

https://github.com/Shivam010/bypass-cors

As long as the browser is talking to a server that's setting the correct CORS headers, that server can of course forward those requests to whatever third party server it wants.

vk6

Classic CORS proxies are bad for privacy though. They read the contents of the forwarded requests in plain text, which might include API keys or other secrets. This is problematic though, since the typical use case for CORS proxies is if you're unable to host your own backend.

With this kind of solution, the proxy only deals with the data in the underlying TCP socket. That data will be encrypted with TLS until it gets to the destination server. In this case, you don't need to fully trust the proxy sever to use it safely.

aaroninsf

That's... interesting!

nticompass

The "real" answer: because you can.

HN

Making Libcurl Work in WebAssembly

Making Libcurl Work in WebAssembly