Skip to content(if available)orjump to list(if available)

Supercharge SQLite with Ruby Functions

Supercharge SQLite with Ruby Functions

40 comments

·January 24, 2025

ncruces

Taking full advantage of SQLite's extensibility is a center piece of my Go driver.

It started due to having to reimplement the OS layer in Go because of tech constraints, but it means you _can_ implement VFSes, UDFs (scalar, aggregates and windows), and virtual tables in Go, with reasonable performance and nice APIs.

I also made a point of dogfooding this as much as possible, with a bunch of extensions and a few custom VFSes that use the same APIs available to clients of the library.

https://github.com/ncruces/go-sqlite3/tree/main/ext

https://github.com/ncruces/go-sqlite3/tree/main/vfs#custom-v...

oefrha

Interesting, I need to take a look at your vfs implementations. Some of my projects could benefit from a transparent zstd vfs layer, but compiling and loading one of the existing C options in a Go project is yucky even if possible.

ncruces

I've never done it, but for the readonly case, I think all you need to is to combine the existing reader VFS with zstd-seekable-format-go.

https://github.com/ncruces/go-sqlite3/discussions/117#discus...

For read/write, I'm honestly not sure. The Zipvfs is an… erm… architectural mess that only really works because it accesses private SQLite APIs. Which is fine, but history has shown the SQLite team is willing to break those APIs, as they did for the ones they build the SQLite encryption extension on.

https://sqlite.org/zipvfs/doc/trunk/www/howitworks.wiki

The zstandard alternative is sqlite_zstd_vfs, which faces the same architectural issues. So, I'd rather not go there. But should be doable, as long as you're not needing private APIs.

https://github.com/mlin/sqlite_zstd_vfs

hinkley

The problem with those API breaks is rarely the fact that they got broken, it's the fire drill that happens after they do.

If you get lucky, the 2 necessary upgrades happen at a time that fits well into your schedule. If you don't get lucky, then the SQLite upgrade you need contains a CERT advisory for a zero day attack, and not only does that not fit into your schedule but it also doesn't fit into the schedule of the person who did the customization.

These are rare events but over the course of a project, that low priority taken to the exponent of the number of vendors you decide to play that game with, approaches or exceeds a probability of 1.00 (>1 meaning 'happened to us twice')

oefrha

Thanks for the pointers!

newusertoday

i wish there was something like this for wasm. There are wasm builds available but its not easy to extend them with your own functions.

bob1029

You can hypothetically write your entire product in SQL with appropriate bindings to SQLite. For me, SQL with CTEs is a very compelling way to model tricky business logic. Building a SQL-based rules engine is trivial if your host language has a good SQLite library.

This thing starts to grow legs once you realize you can recursively get into the rabbit hole by binding something like an Execute_Sql UDF - You can store the actual scripts within the same schema they operate on. Treating your code as data means you can do things like transactional updates of business logic while the system is serving live requests. You also get simple reflection & search over the business logic.

maCDzP

Is this the same thing as create_function for python?

https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...

julik

OP here. Yes, it is, except that the only builtin flag it offers is `deterministic` - if you want to also use "directonly" you probably will need the same OR trick.

tobyhinloopen

Ive nothing to say but “that’s cool”. I want to try this in NodeJS!

I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network. I wonder what the overhead is.

Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?

julik

> Why not use UUIDv7 if you want time-ordered UUIDs?

It is our flavour of NIH, that said - Tou has a finer-resolution timestamp. We also didn't do our homework right and assumed the v7 UUIDs won't be accepted by Postgres because of a different "version" value.

masklinn

> I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network.

“Infeasible” is very fast. sqlite runs in process so you can register a function pointer or five, with a trampoline back into the runtime.

Can’t do that over the network, you can create functions but only using the database’s procedural langage(s in the case of Postgres).

anarazel

FWIW, the set of procedural languages in postgres is runtime extensible: https://www.postgresql.org/docs/current/sql-createlanguage.h...

dragonwriter

> Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?

It looks like the rationale for the Tou library is that some systems do not accept unfamiliar UUID variants as UUIDs, so a time-ordered ID that looks like a UUIDv4 is safer for some legacy systems than a (newer, and less likely recognized) UUIDv7.

julik

This. We did discover, however, that Postgres will, in fact, swallow a UUIDv7 just fine. After having written that library :-)

dragonwriter

I kind of thought it would ingest but not generate unknown UUID versions that nonetheless fit the broader UUID structure, but not having tried I didn't want to bring that up.

julik

It is much cheaper, because you won't have roundtrips or requirements for the availability of extensions on the database server end. It's really a very very sweet capability that SQLite is able to provide exactly because it is hosted by the application.

relistan

> Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?

I’m not the author but UUIDv7 came out in about 2022. Guessing this is legacy stuff that long predated that. There were lots of solutions to solve this problem before there was a standard.

dragonwriter

> Guessing this is legacy stuff that long predated that.

I’m guessing its not, since the tou library seems to be 8 months old and mentions avoiding the need for extensions if you are using it with Postgres as an advantage over using UUIDv7.

relistan

Great, thanks

postepowanieadm

better-sqlite3 got you covered!

tobyhinloopen

better-sqlite3 is (was?) my favorite sqlite3 lib, it is incredibly fast.

NodeJS has SQLITE3 support these days!

https://nodejs.org/api/sqlite.html

Interestingly it is NOT async, like better-sqlite3. I wonder why. I've been looking for any public remarks about it, but found nothing.

darthShadow

These may be useful reading:

https://github.com/TryGhost/node-sqlite3/issues/408#issue-57...

https://github.com/WiseLibs/better-sqlite3/issues/32#issueco...

Copying a quote from the second:

> The sqlite3 C API serializes all operations (even reads) within a single process. You can parallelize reads to the database but only by having multiple processes, in which case one process being blocked doesn't affect the other processes anyways. In other words, because sqlite3 serializes everything, doing things asynchronously won't speed up database access within a process. It would only free up time for your app to do other things (like HTTP requests to other servers). Unfortunately, the overhead imposed on sqlite3 to serialize asynchronous operations is quite high, making it disadvantageous 95% of the time.

postepowanieadm

That's nice! I use bun, but it's lack of support for UDF makes me consider switching back to node.

mingodad

shakna

Also part of Python's standard implementation of sqlite3. [0]

[0] https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...

julik

With any decent host runtime, as it were.

steve_gh

There are also some good libs for SQLite. This is the standard set that I use. https://github.com/nalgeon/sqlean

dennisvdvliet

That is pretty cool. And good to see you still doing talks.

melvinroest

I happen to work with BigQuery since recently and heard you can do UDFs with JavaScript. Good to know that this is a thing with more databases. I didn't need to use UDFs up until this point but now that I know about it a bit more, I just might as JS happens to be a strong language of mine (currently not using it professionally).

julik

We actually use a couple Ruby UDFs with BigQuery, deployed as Google Cloud Functions. Works pretty well.

Alifatisk

What an interesting find

throw_m239339

Yes, I remember using that in PHP, very handy before sqlite JSON support, since it also worked with aggregate functions.

https://www.php.net/manual/en/sqlite3.createfunction.php

igravious

from the comments:

   [Python](https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.create_function)
https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...

   [Lua](http://lua.sqlite.org/index.cgi/doc/tip/doc/lsqlite3.wiki#db_create_function)
http://lua.sqlite.org/index.cgi/doc/tip/doc/lsqlite3.wiki#db...

   [Node.js](https://nodejs.org/api/sqlite.html#databasefunctionname-options-function)
https://nodejs.org/api/sqlite.html#databasefunctionname-opti...

   [PHP](https://www.php.net/manual/en/sqlite3.createfunction.php)
https://www.php.net/manual/en/sqlite3.createfunction.php