In-Memory Filesystems in Rust

32 comments

·August 24, 2025

mncalc7

Why not just use tmpfs?

kolektiv

It always surprised me somewhat that there isn't a set of traits covering some kind of `fs` like surface. It's not a trivial surface, but it's not huge either, and I've also found myself in a position of wanting to have multiple implementations of a filesystem-like structure (not even for the same reasons).

Tricky to make that kind of change to std lib now I appreciate, but it seems like an odd gap.

lenkite

Go has a basic FS abstraction in the standard library: https://dev.to/rezmoss/gos-fs-package-modern-file-system-abs...

But the stdlib one is a bit barebones. So people created: https://github.com/spf13/afero

Joker_vD

See David R. Hanson's "A Portable File Directory System" [0][1], for example: a 700 lines long implementation of early UNIX's filesystem API that piggy-backs on some sort of pre-existing (block-oriented) I/O primitives, which means you can do it entirely in-memory, with about another 300 lines of code or so.

I suspect that with OSes becoming much more UNIX-like the demand for such abstraction layers shrank almost to nothing.

[0] https://drh.github.io/documents/pds-spe.pdf

[1] https://drh.github.io/documents/pds.pdf

ozgrakkurt

Mocking file system or network seems counter productive.

Complicated logic can be in pure functions and not be intertwined with IO if it needs to be tested.

Mocking IO seems like it won’t really capture the problems you might encounter in reality anyway.

Arch-TK

Fault injection is much easier if you can mock IO. And you aren't really testing your software if you're not injecting faults.

kolektiv

It's not always about mocking (in my cases it hasn't been). Sometimes it is about multiple "real" implementations - a filesystem is itself an abstraction, and a very common one, it seems like it would at least sometimes be useful to be able to leverage that more flexibly.

eru

I have a little system that takes a .git and mounts it as a fuse filesystem. Every commit becomes a directory with a snapshot of the project at that point in time.

You could read the whole .git in at once, and then you'd have an in-memory file-system, if you wanted to.

In any case, I agree with you: it's not about mocking.

kryptiskt

Some examples where it would be useful: Exposing a zip file or exe-embedded data as a filesystem, or making an FS backed by 9P, WebDAV or SFTP.

stingraycharles

Same like mocking databases; yes they make your tests run faster and more “pure”, but you’re suddenly not really testing reality anymore.

boomlinde

Sure, but they're not mutually exclusive approaches. Having tests that run in a couple of seconds thanks to low I/O and cheap scaffolding setup/teardown can be a valuable addition to slower, comprehensive system and integration test suites.

bhawks

Posix file system semantics are very complex. An in memory implementation is likely to have quality gaps that make it sub optimal. If you want fast tests /tmp is likely to be backed by tmpfs in memory. If you are paranoid you can use /dev/shm to be explicit about your desire.

Going this route means you're going to leverage all the well tested Linux VFS code and your tests will execute with higher fidelity.

mpweiher

Yeah, SSDs are really fast. Have been for a while now, so fast that system call and other kernel overheads easily dominate unless you take care:

https://blog.metaobject.com/2017/02/mkfile8-is-severely-sysc...

That was 8 years ago, and even then mkfile needed a 512K buffer size to saturate the hardware. With the 512 byte default buffer it was 8x slower than the hardware.

In addition, as others have pointed out, if you are not doing something extra to ensure things are flushed to disk, you are just measuring the buffer cache in the first place.

adastra22

I no mention of fsync/sync_all. That’s why your disk file system is acting as fast as your in memory file system (for small tests). Both are effectively in-memory.

indirect

I guess I wasn't sufficiently clear in the post, but the part I think is interesting is not that tmpfs and SSD bench at the same speed. I am aware of in-memory filesystem caches, and explicitly mention them twice in the last few paragraphs.

The interesting part, to me, was that using the vfs crate or the rsfs crate didn't produce any differences from using tmpfs or an SSD. In theory, those crates completely cut out the actual filesystem and the OS entirely. Somehow, avoiding all those syscalls didn't make it any faster? Not what I expected.

Anyway, if you have examples of in-process filesystem mocks that run faster than the in-memory filesystem cache, I'd love to hear about them.

maxbond

A Rust-specific danger is that, if you don't explicitly sync a file before dropping it, any errors from syncing are ignored. So if you care about atomicity, call eg `File::sync_all()`.

dezgeg

Is that really rust-specific? I would be really surprised if any other languages do fsync() in their destructor either

maxbond

To be clear `File::drop()` does sync, it just ignores errors (because `drop()` doesn't have a way of returning an error). It's not really Rust specific I guess, I just don't know off the top of my head what other languages behave this way.

the8472

On most filesystems close(2) is nearly a noop, so even if you surfaced errors from close it returning successfully would not guarantee an absence of errors.

close without fsync (or direct IO) essentially is telling the OS that you don't need immediate durability and prefer performance instead.

silon42

I'd almost never want do to fsync in normal code (unless implementing something transactional)... but I'd want an explicit close almost always (or drop should panic/abort).

01HNNWZ0MV43FF

For context - cppreference.com doesn't say anything about `fstream` syncing on drop, but it does have an explicit `sync` function. `QFile` from Qt doesn't even have a sync function, which I find odd.

aw1621107

I had always assumed that fstream flushes on destruction, but after digging through the standard all I can conclude is that I'm confused.

According to the standard, fstream doesn't have an explicit destructor, but the standard says "It uses a basic_filebuf<charT, traits> object to control the associated sequences." ~basic_filebuf(), in turn, is defined to call close() (which I think flushes to disk?) and swallow exceptions.

However, I can't seem to find anything that explicitly ties the lifetime of the fstream to the corresponding basic_filebuf. fstream doesn't have an explicitly declared destructor and the standard doesn't require that the basic_filebuf is a member of fstream, so the obvious ways the file would be closed don't seem to be explicitly required. In addition, all of fstream's parents' destructors are specified to perform no operations on the underlying rdbuf(). Which leaves... I don't know?

cppreference says the underlying file is closed, though, which should flush it. And that's what I would expect for an RAII class! But I just can't seem to find the requirement...

goodpoint

This is not correct. Programming languages do not and should not call sync automatically.

znpy

so the good old `sync; sync; sync;` ?

IshKebab

I guess the overhead from the syscalls simply wasn't that significant. Syscalls aren't that slow (~0.5us). This never says how many you were doing.

ozgrakkurt

Also kernel code is probably more optimized than a mock library

eumon

you may try /dev/shm for the testing purpose, which is effectively an in memory filesystem that linux provides, it is very performant

j1elo

> It turns out the intended primary use case of the crate is to store files inside Rust binaries but still have an API sort of like the filesystem API to interact with them. Unfortunately, that information is hidden away in a comment on a random GitHub issue, rather than included in the project readme.

A+ on technical prowess,

F- on being able to articulate a couple words about it on a text file.

the8472

> but all my benchmarks seem to disagree.

Well, benchmarks could be wrong or misleading. Did you make sure that the IO actually happens and that it dominates the process execution time?