Skip to content(if available)orjump to list(if available)

SQLite Disk Page Explorer

SQLite Disk Page Explorer

17 comments

·February 6, 2025

electroly

It was revelatory to write an SQLite virtual filesystem module for a storage layer with extremely high latency (S3). Every seek matters. This would have helped; I looked at the reads executed by SQLite and tried to intuit what was going on inside the file.

Protip: Use WITHOUT ROWID with monotonic IDs if you don't want your rows sprayed randomly around the file! That one change was the difference between SQLite-on-S3 being unusably slow and being fast enough. WITHOUT ROWID tables let you manage the physical clustering.

koeng

What are the disadvantages of using WITHOUT ROWID?

electroly

The main advantage is also its biggest disadvantage: you CAN control the clustering of the rows... and you MUST control the clustering of the rows. SQLite won't give you an auto-incremented ID on a WITHOUT ROWID table. You have to assign the IDs yourself and SQLite won't help you.

ustad

Do you have a url describing your protip?

electroly

The only thing I really have is the SQLite docs, and some of the advice in here does not match my own testing and I would recommend the opposite.

https://www.sqlite.org/withoutrowid.html

In particular, they say:

> WITHOUT ROWID tables will work correctly (that is to say, they provide the correct answer) for tables with a single INTEGER PRIMARY KEY. However, ordinary rowid tables will run faster in that case. Hence, it is good design to avoid creating WITHOUT ROWID tables with single-column PRIMARY KEYs of type INTEGER.

This has not proven correct in my testing, but perhaps other applications are different. IMO, you _should_ use WITHOUT ROWID on your tables with single-column PRIMARY KEYs of type INTEGER. With a 100ms seek time on S3 requests it's obvious that the WITHOUT ROWID table with monotonic IDs is benefitting from spatial locality and rowid tables are not. I suspect when giving their advice, they are not considering queries of contiguous ranges of ids (which happens naturally more often than you'd think when JOINs are involved).

benediktwerner

I mean, you might still be right but I'd rather suspect that they didn't consider somebody running SQLite over S3 with 100ms latency ...

Presumably, it is faster on a saner filesystem.

maples37

https://news.ycombinator.com/item?id=42965714 will work if you're just bookmarking for the future

Retr0id

Neat! It's surprisingly easy to sit down with a copy of the sqlite file format docs and start writing code to parse sqlite files (like this project does), and I'd strongly recommend it for all the same reasons listed in the "Why?" section of the readme here.

https://www.sqlite.org/fileformat.html

grimgrin

xrd

Redbean is indeed so cool.

I always wanted to do something using Janet in the same way with images and compile-time programming. Fun possibilities.

https://janet.guide/compilation-and-imagination/

shlomo_z

Thank you!

As someone who isn't disciplined enough to sit through a course or class, this was a really good way to visualize what's going on under the hood, and how to structure my data more efficiently.

hoc

Did that for a security tool about more than ten years ago. Makes you love that tightly designed engine a bit more.

I always thought an explorer or just an base lib would be fun. Great to see yours, especially with a MIT license. Tbanks for sharing.

aappleby

Tool works well.

artps

[dead]

luizfelberti

Upvoted for using Redbean. I've been using it recently and it has been absolutely amazing, the built-in functionality that it has exposed through the Lua interface makes it an extensively programmable proxy that you can sandbox the crap out of if you're familiar with the unixy bits of it

null

[deleted]