On File Formats
9 comments
·May 21, 2025thasso
For archive formats, or anything that has a table of contents or an index, consider putting the index at the end of the file so that you can append to it without moving a lot of data around. This also allows for easy concatenation.
charcircuit
Why not put it at the beginning so that it is available at the start of the filestream that way it is easier to get first so you know what other ranges of the file you may need?
>This also allows for easy concatenation.
How would it be easier than putting it at the front?
shakna
[delayed]
lifthrasiir
If the archive is being updated in place, turning ABC# into ABCD#' (where # and #' are indices) is easier than turning #ABC into #'ABCD. The actual position of indices doesn't matter much if the stream is seekable. I don't think the concatenation is a good argument though.
mjevans
Most of that's pretty good.
Compression: For anything that ends up large it's probably desired. Though consider both algorithm and 'strength' based on the use case carefully. Even a simple algorithm might make things faster when it comes time to transfer or write to permanent storage. A high cost search to squeeze out yet more redundancy is probably worth it if something will be copied and/or decompressed many times, but might not be worth it for that locally compiled kernel you'll boot at most 10 times before replacing it with another.
leiserfg
If binary, consider just using SQLite.
lifthrasiir
Using SQLite as a container format is only beneficial when the file format itself is a composite, like word processor files which will include both the textual data and any attachments. SQLite is just a hinderance otherwise, like image file formats or archival/compressed file formats (cf. sqlar).
adelpozo
I would add to make it streamable or at least allow to be read remotely efficiently.
[delayed]