Skip to content(if available)orjump to list(if available)

Postgres as a Graph Database: (Ab)Using PgRouting

ashvardanian

Five years ago I was absolutely frustrated with the state of Graph databases and libraries and tried putting several non-Graph DBMSs behind a NetworkX-like Python interface <https://github.com/unum-cloud/NetworkXum>.

When benchmarked, Neo4J crashed on every graph I’ve tried <https://www.unum.cloud/blog/2020-11-12-graphs>, making SQLite and Postgres much more viable options even for network-processing workloads. So I wouldn’t be surprised to learn that people actually use pgRouting and Supabase in that setting.

With the rise of Postgres-compatible I’m wondering if it’s worth refreshing the project. Similarly, there are now more Graph DBs like MemGraph compatible with CYPHER, which should probably work much better than Neo4J.

henryfjordan

I had almost exactly the opposite experience, although my dataset was pretty small.

We wanted to store a graph in postgres and ended up writing some recursive queries to pull subgraphs then had NetworkX layered over it to do some more complex graph operations. We ended up doing that for a short while but then switched to Neo4j because of how comparatively easy it was to write queries (although the Python support for Neo4j was severely lacking). Never really stressed it out on dataset size though.

I did manage to crash Redis' graph plugin pretty quickly when I was testing that.

SahAssar

Not sure what you consider "quite small" and I don't know how NetworkX works, but postgresql recursive queries have worked well for me for small graphs.

Could you share what the data structure and scale was?

henryfjordan

We basically had a single table that we wanted to be able to nest on itself arbitrarily. Think categories and subcategories, maybe 100k nodes/rows

Postgres worked fine but cypher is so much more expressive and handles stuff like loop detection for you, neo4j was much easier to work with. Performance wasn't ever really an issue with either.

nemo44x

Neo4j is pretty bad and very dated. No idea what they’re doing. MemGraph is a much better tool.

Really graph is a feature and not a product.

nrjames

I’ve always wondered why there isn’t a “SQLite for graphs,” so to speak. Is there something about how they have to be stored that precludes an in-process solution with disk-based storage?

ryangs

https://www.hillelwayne.com/post/graph-types/ gives an interesting take on why we don't see a graph type as a primitive on more programming languages. Essentially boils down to graphs being very vague and depending on the topology of your graphs you are going to want different implementations for reasonable efficiency.

That said, there are graph databases.

nrjames

Interesting! Thanks for the link. I suppose the graph databases just take an opinionated approach. NetworkX is great; I always wished it had a simple backend.

canadiantim

There is now, it's Kuzudb, embedded as well.

kiwicopple

My original goal in this article was to figure out if pgrouting would be a good tool to build a memory-layer (for AI/agents) but the article got a bit long. Early results are promising - I’ll follow up with another article soon

there are some other interesting extensions in this space - onesparse[0] is early in development but pretty exciting as it builds on SuiteSparse which is very mature

[0] https://onesparse.com/docs.html

michelpp

Thanks Paul! OneSparse author here, we're still in early dev stages (OneSparse requires some new features in postgres that won't be available until pg18) but my plan is to do a benchmark shootout with various graph tool for postgres. Initial results look good, on my 4-core 11th gen intel laptop we're getting some really good numbers!

             LiveJournal                 Orkut
  Nodes:     3,997,962                   3,072,441
  Edges:     34,681,185                  117,185,037
  Triangles: 177,820,130                 627,583,972

                  Seconds  Edges/Second  Seconds  Edges/Second
  Tri Count LL:   2.69     12,892,634    32.03    3,658,602
  Tri Count LU:   1.78     19,483,812    16.38    7,156,338
  Tri Centrality: 1.45     23,918,059    12.22    9,589,610
  Page Rank:      7.12     4,870,953     23.14    5,064,176
Orkut was as big as I could go due to limited RAM. One of my constrains is limited access to big enough hardware to do the kinds of Graphs Of Unusual Size (billions of edges, trillions of triangles) where we can really flex the scale that CUDA support gives us. Stay tuned!

szvsw

Supabase consistently puts out such fantastic bite-sized gems - and for me some of my favorites have been related to PostGIS - whether it’s about serving tiles directly, or this (ab)use of functionality typically used in a PG geospatial context. Nothing revolutionary or massive and complex - not like reading DDIA of course, but just fun and mentally activating, making me want to jump into something new. I really applaud them for frequently posting actually engaging content that just gets you excited to work with databases… it sounds silly to say it like that, but it does feel like I get regularly struck with the feeling of sadness when I realize how vanilla all of my daily development related interactions with dbs are so vanilla.

holtwork

I'm working on a little Poatgres graph db project. The querying and table structure is much simpler for the same task:

https://memelang.net/03/ https://github.com/memelang-net/memesql3

xnx

Resourceful, but is there a reason to use this approach over pgvector?

nyrikki

There is an unfortunate overloaded of the terms relational and relationships in relational databases.

Relationships is association between relations/tables, parent-child, node/edge etc, depending on model, extensions etc..

There are three basic models of databases:

      model name.  | basic data structure
      -----------------------------------
      relational   | tables
      hierarchical | trees
      network      | graph 
A "relational" in RDBMS and Codd's rules is just a table data structure with some additional rules.

Part of those rules are a named table, with named and typed attributes (columns) with data in the form of rows of tuples.

PgVector is nearest neighbor search for tuple values, often from a single table/relation while PgRouting is graph traversal for relational data.

There is a bit more to that and in the relational model the data is independent of the schema, and no RDBMS is pure.

It is possibly helpful to realize that pgvector is about finding neighbors among tuples, in that relation/table, and that it is very different from graph traversal.

whalesalad

Has anyone done something like this with a very large dataset? Hundreds of millions of rows.

canadiantim

Supabase such a gem