What If OpenDocument Used SQLite?

39 comments

·September 4, 2025

agwa

If you're going to use SQLite as an application file format, you should:

1. Enable the secure_delete pragma <https://antonz.org/sqlite-secure-delete/> so that when your user deletes something, the data is actually erased. Otherwise, when a user shares one of your application's files with someone else, the recipient could recover information that the sender thought they had deleted.

2. Enable the options described at <https://www.sqlite.org/security.html#untrusted_sqlite_databa...> under "Untrusted SQLite Database Files" to make it safer to open files from untrusted sources. No one wants to get pwned when they open an email attachment.

3. Be aware that when it comes to handling security vulnerabilities, the SQLite developers consider this use case to be niche ("few real-world applications" open SQLite database files from untrusted sources, they say) and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed. https://www.sqlite.org/cves.html

They fail to mention any of this on their marketing pages about how you should use SQLite as an application file format.

chris_wot

"Most applications can use SQLite without having to worry about bugs in obscure SQL inputs." And then they recommend SQLite as a document interchange format.

Seattle3503

Hrm, using sqlite as an application format would be a good use case for Limbo.

liuliu

One thing I would call out, if you use SQLite as an application format:

BLOB type is limited to 2GiB in size (int32). Depending on your use cases, that might seem high, or not.

People would argue that if you store that much of binary data in a SQLite database, it is not really appropriate. But, application format usually has this requirement to bundle large binary data in one nice file, rather than many files that you need to copy together to make it work.

Retr0id

You can split your data up across multiple blobs

lifthrasiir

SQLite can't be reliably used in networked file systems because it heavily relies on locking to be correctly implemented. I recently had to add a check for such file systems in my application [1] because I noticed a related corruption firsthand. Simpler file formats do not demand such requirements. SQLite is certainly good, but not for this use.

[1] https://github.com/lifthrasiir/angel/commit/50a15e703ef2c1af...

greenavocado

Easy fix is an empty lock file adjacent to the real one.

lifthrasiir

Yeah, but only if SQLite did support that mode in some built-in VFS implementation...

hedora

Which network filesystems are still corrupting sqlite files?

Sqlite on NFSv3 has been rock solid for some NFS servers for a decade.

Maybe name and shame?

chmaynard

Dr. Hipp occasionally gets on a soapbox and extolls the virtue of sqlite databases for use as an application file format. He also preaches about the superiority of Fossil over Git. His arguments generally make sense. I tolerate his sermons because he is one of the truly great software developers of our time, and a personal hero of mine.

jll29

I love SQLite.

As a document _exchange_/_interchange_ format, what I prefer for durability is a non-binary format (e.g. XML based).

For local use, I agree SQLite might be much faster than ZIP, and of course the ability to query based on SQL has its own flexibility merits.

floating-io

An interesting skim, but it would have been more meaningful if it had tackled text documents or spreadsheets to show what additional functionality would be enabled with those beyond "versioning".

Maybe it's just me, but I see the presentation functionality as one of the less used aspects of the OpenOffice family.

sakesun

If I remember correctly Mendix project file format is simply a sqlite db. I thought the designer was lazy but it turns out it's a reasonable decision.

Recently, DuckDB team raise similar question on DataLake catalog format. Why not just use SQL database for that ? It's simpler and more efficient as well.

sgc

It seems like it would be relatively straightforward to make an sqlite based file format and just have users add a plugin if for some reason they couldn't upgrade their older version of LibreOffice etc. I agree with the other commenter who mentioned that the benefits for text and spreadsheet files needs more explanation. But it seems like a good enough idea to have a LibreOffice working group perform a more in depth study. If significant memory reduction is real and that would translate to fewer crashes, it would be a huge boost even if it had no other benefits, IMHO.

conorbergin

I've being trying out SQLite for a side project of mine, a virtual whiteboard, I haven't quite got my head around it, but it seems to be much less of a bother than interacting with file system APIs so far. The problem I haven't really solved is how sync and maybe collaboration is going to interact with it, so far I have:

1. Plaintext format (JSON or similar) or SQLite dump files versioned by git

2. Some sort of modern local first CRDT thing (Turso, libsql, Electric SQL)

3. Server/Client architecture that can also be run locally

Has anyone had any success in this department?

rogerbinns

SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.

https://sqlite.org/sessionintro.html

That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:

https://rogerbinns.github.io/apsw/example-session.html

hahn-kev

CRDTs are the way to go if you need something very robust for lots of offline work.

atonse

Didn’t Apple actually move to SQLite for their Pages/Numbers format? I remember reading years ago that it was rocky (the transition), but was maybe eventually smoothed out?

mdaniel

Given n=1 https://freeiworktemplates.com/2022/05/pages-concessions-sta... seems to imply the answer is "no, it's a zip" and that seems to hold even for the interior files

  $ file concessions-stand-menu-template.pages
  concessions-stand-menu-template.pages: Zip archive data, at least v2.0 to extract, compression method=store

  $ unzip -l concessions-stand-menu-template.pages
  Archive:  concessions-stand-menu-template.pages
    Length      Date    Time    Name
  ---------  ---------- -----   ----
      58727  05-09-2022 13:27   Data/Artboard 2-26.png
      26993  05-09-2022 13:27   Data/Artboard 2-small-27.png
      11550  05-10-2022 08:13   Index/Document.iwa
        720  05-10-2022 08:13   Index/ViewState.iwa
        536  05-09-2022 12:41   Index/CalculationEngine-1686619.iwa
         23  07-02-2021 17:48   Index/AnnotationAuthorStorage-1686618.iwa
      43891  05-09-2022 12:41   Index/DocumentStylesheet.iwa
        229  05-09-2022 13:28   Index/DocumentMetadata.iwa
      17895  05-10-2022 08:13   Index/Metadata.iwa
        379  05-10-2022 08:13   Metadata/Properties.plist
         36  05-09-2022 12:41   Metadata/DocumentIdentifier
        268  04-29-2022 22:18   Metadata/BuildVersionHistory.plist
     135503  05-10-2022 08:13   preview.jpg
       1666  05-10-2022 08:13   preview-micro.jpg
      11057  05-10-2022 08:13   preview-web.jpg
  ---------                     -------
     309473                     15 files

  $ unzip concessions-stand-menu-template.pages Index/Document.iwa
  extracting: Index/Document.iwa

  $ file Index/Document.iwa
  Index/Document.iwa: data

  $ xxd -l 128
  00000000: 001a 2d00 bcae 0170 6408 0112 6008 904e  ..-....pd...`..N
  00000010: 1203 0100 0518 c90c 2209 0a03 0a01 3010  ........".....0.
  00000020: 0118 0122 0701 0b08 2e18 0109 1400 2f05  ..."........../.
  00000030: 14f4 a801 0b0a 050a 030f 0111 1003 1800  ................
  00000040: 2a27 daf8 66db f866 dcf8 66e6 f768 ddf8  *'..f..f..f..h..
  00000050: 66df ef66 def8 66d1 f666 fdf5 66d5 f566  f..f..f..f..f..f
  00000060: 8ff8 66df f866 86f9 6612 0408 def8 661a  ..f..f..f.....f.
  00000070: 0408 fdf5 6622 0408 8ff8 6632 0408 dfef  ....f"....f2....

RainyDayTmrw

Juggling all the fragments inside the database, garbage collecting all the unused ones, and maintaining consistency are all quite challenging in this use case.

supportengineer

What if instead of API's for data sets, we simply placed a sqlite file onto a web server as a static asset, so you could just periodically do a GET and have a local copy.

yupyupyups

This works as long as the data is "small" and you have no ACL for it. Assuming you mean automatic downloads.

Devdocs does something similar, but there you request to download the payload manually, and the data is still browsable online without you having to download all of it. The data is also split in a convenient manner (by programming language/library). In other words, you can download individual parts. The UI also remains available offline, which is pretty cool.

https://devdocs.io/

abtinf

A few years ago someone posted a site that showed how to query portions of a SQLite file without having to pull the whole thing down.

dbarlett

https://news.ycombinator.com/item?id=27016630

supportengineer

>> I implemented a virtual file system that fetches chunks of the database with HTTP Range requests

That's wild!

abtinf

With an S3 object lambda, I suppose you could generate the sqlite file on the fly.

anon291

You can do this today by using the WASM-compiled SQLite module with a custom Javascript VFS that implements the SQLite VFS api appropriately for your backend. I've used it extensively in the past to serve static data sets direct from S3 for low cost.

More industrious people have apparently wrapped this up on NPM: https://www.npmjs.com/package/sqlite-wasm-http

HN

What If OpenDocument Used SQLite?

What If OpenDocument Used SQLite?