Skip to content(if available)orjump to list(if available)

Ask HN: Code should be stored in a database. Who has tried this?

Ask HN: Code should be stored in a database. Who has tried this?

49 comments

·March 30, 2025

To me it seems obvious that code should be stored in a database rather than a hierarchical, text-based format.

The main way we navigate and organize code is by folder hierarchies. Everyone has a different approach: by feature, by module, by file type (template, component, etc.), by environment (backend/frontend).

Rather than folders and file names, everything could just be tagged in different ways.

Who has tried this and what is the best tool for working like this today?

igouy

"ENVY/Manager augments this model by providing configuration management and version control facilities. All code is stored in a central database rather than in files associated with a particular image. Developers are continuously connected to this database; therefore changes are immediately visible to all developers.""

https://www.google.com/books/edition/Mastering_ENVY_Develope...

~

pdf 1992 Product Review: Object Technology’s ENVY Developer

http://archive.esug.org/HistoricalDocuments/TheSmalltalkRepo...

Lutger

You don't have to store things in a database to do this. Code is almost almost read from disk into some kind of in memory data structure that is amenable to such analysis, maybe even more so than a generic database. Doesn't matter if you use vscode or vim, most developers have some kind of tool that does semantic analysis and which affords navigation and organization of code.

Its just that the main way code editors present navigation follows the path hierarchy, also because its often intimately tied to how programming languages shape modules. Most editors have at least some alternative navigation however, and most people are using at least some of them: outlining by declaration symbols, search, changes, unittests, open files, bookmarks, etc.

So in a way, this is already how it is done, except the 'database' part is really tied to the code editor and its storage component nicely decoupled (in the end, databases are usually also just a bunch of files).

I think any real improvements on this model can only come from a new programming language design, and as others have pointed out, this hasn't caught on in the past. The reason for this is probably not that file oriented modularity is the best thing there is, but rather the escape velocity needed to get out of the vast ecosystem of tooling around files, like the OS, git and existing code editors and whatnot.

jrjsmrtn

Hmmm... Smalltalk, a pure object-oriented language, stores everything in an image, and has tons of different browsers to inspect its "object soup". Install a Squeak Smalltalk if you're curious :-)

Userland Frontier was a wonderful scripting environment born on macOS and ported to Windows. It was a mix of an object database, storing code and data, an extensible scripting language called UserScript, and very powerful InterApplication capabilities, based on Apple's Open Scripting Architecture. Dave Winer, its author, worked on the XML-RPC standard afterwards.

igouy

Smalltalk stores:

memory snapshot "image"

AND "change log" text file

AND "sources" text file.

https://cuis-smalltalk.github.io/TheCuisBook/Code-Management...

If the "sources" file is missing the byte code will be decompiled to show class and method definitions, but the original names will be unknown.

xkriva11

It is only an implementation detail. It is a matter of change of a few methods only to store sources directly in methods. I tried that once.

igouy

It's all only "an implementation detail".

Some of them are documented and expected.

lutzh

In the Unison language, code is stored in a database, with a hash code of its content as the key. Quoting https://www.unison-lang.org :

A new approach to Storing code. Other tools try to recover structure from text; Unison stores code in a database. This eliminates builds, provides for instant nonbreaking renames, type-based search, and lots more.

cdirkx

I don't think this is unique to code, but a limitation of filesystems in general. You could make the same argument for photos: I want them sorted by date, by tag, by person in the image, by location.

I can do this in Lightroom or my "Photo" app, but then you are always reliant on some third-party tool. It would be nice if there was some native way for files to not have to commit to a single hierarchy, but able to switch views on the fly (without it being insanely slow for larger amount of files).

unilynx

We did this for a long time for our CMS - although we did simulate a filesystem structure. We also set up a git-like system to store versioning information and set up WebDav to mount it all and allow direct source code editing. It worked pretty well for years.

We eventually stopped because we were relying much more on external tools (eg npm, webpack) which had all sort of issues over webdav mounts. Maintaining all this code management infrastructure in parallel wasn't worth it in the end, and we moved the code back to disk, switched to git, etc.

And photoshop silently ignoring webdav I/O errors when saving designs didn't help either.

You already have tagging by type on the filesystem - the file extension. That allows you to limit file searches. Add extra metadata to extensions if the same extensions have different roles (.backend.ts, .frontend.ts, .html.template, .text.template)

These days I prefer to structure for easy removal of code - everything for eg. a widget (frontend, backend, css) goes into a folder and I only need to remove that folder when the widget is retired, and linting/validation will show me the few remaining path references I need to cleanup.

andrewaylett

I do store all my code in a database. It's got time-travel functionality, the ability to switch into parallel universes, and a nice hierarchical view that lets me find things easily if I don't want to use my language-specific indexes.

Yes, that's git, a filesystem, and an IDE -- and the physical layout of the code isn't the way I normally navigate it. It's useful structure for the tooling, though.

It's definitely true that "using git" or "putting our code on the filesystem" aren't ends in themselves, they are means to an end. If we found a way to meet our requirements that has fewer trade-offs to git then I'm sure we'd jump. Git and filesystems are possibly the worst options for organising code and history, except for all the other options out there :P.

r24y

That's basically what an LSP is. It's true that it's built on top of the file system, and most IDE users will navigate using the folder hierarchy, but it still stores information about the name, type, and connectedness of the codebase, and allows querying. Your idea about arbitrary tags (feature, environment) would be useful but does not seem to be supported by the spec [^1] yet.

[^1]: https://microsoft.github.io/language-server-protocol/specifi...

codingdave

Lotus Notes did that. The database held the code, the data, the UX, the security. There was a standard UX for accessing different types of code, design elements, and the data.

On the positive side, DevOps was a breeze - push a DB to a server and everything just worked. Pushing new code to all the DBs was a breeze. Any dev could immediately jump into an app and have a sense of where they would find elements of the app. All apps ran the same way, so it was realistic for small shops to deliver large products.

On the downside, source control was sub-optimal. That was a weakness in the platform even 25 years ago when it was modern, and never quite improved... although there are ways to import/export the code to make it work with modern source control like git. It also made each app heavier than it needed to be - instead of sharing centralized code, each app had its own copy. Your infrastructure footprint got big, fast.

For a modern take on it, I think other comments are hitting the key point - you might want to have fuzzier definitions of what a database and a file system are. At the end of the day, they are both ways of storing data to disk with different access methods. But it sounds like you are more concerned about DX. To get to your vision, I'd focus more on an IDE that lets you navigate code how you desire, while leaving the actual code storage as a DevOps exercise where they can focus on whatever solutions optimizes delivery and reliability.

jFriedensreich

I looked into four interesting incarnations of this over the years:

1. During the peak phase of couchDB as application server (2006 - 2009) it was common to store not just the data but all the app assets and code in the database and replicate everything together. Plenty of the community tried to bring this to the extreme with every function being stored as versioned document (i see it as precursor to FAAS) and the whole application being editable with an integrated IDE. Also functions in my incarnation of this system were not loaded by filename but with a content addressed manifest. You would reference functions by name but the name would be resolved with a hash manifest.

2. There were several systems with erlang/BEAM to take the hot code replacement to the extreme in similar way, storing code in i believe mnesia.

3. I think bloomberg (i cannot find the hn post to confirm it was them, if someone has the link that would be great) has/had a bespoke code database with custom version control and fully integrated IDE. They leveraged this for some pretty interesting workflows

4. Probably not exactly what you mean as it does not include the runtime integration, but google and sourcegraph are building code databases with indices on symbols and semantic understanding of references and more. I hear great things from people who worked with it especially

movpasd

I can think of an argument for justifying the status quo.

The folder structure reflects the subdivision of code into modules. Each module may have submodules, and each module decides the visibility of its children to other modules at the same level as itself, and to its own supermodule. This is a naturally hierarchical structure, which file systems lend themselves well to. A code database would have to replicate this structure within it somehow anyway.

A non-hierarchical tag system would help model situations where you have multiple orthogonal axes along which to organise the code (as you point out). But in these cases, which axis gets the top-level hierarchy just doesn't matter. Pick one, maybe loosely informed by organisational factors or by your problem conceptualisation.

On the flipside, in situations where a stricter hierarchy would improve modularity, the tag system might _discourage_ clean crystallisation, and cause responsibilities to bleed into each other. IMO, it's more important for there to be modules at all than for their boundaries to be perfect.