Postgres as a Graph Database: (Ab)Using PgRouting

46 comments

·February 27, 2025

ashvardanian

Five years ago I was absolutely frustrated with the state of Graph databases and libraries and tried putting several non-Graph DBMSs behind a NetworkX-like Python interface <https://github.com/unum-cloud/NetworkXum>.

When benchmarked, Neo4J crashed on every graph I’ve tried <https://www.unum.cloud/blog/2020-11-12-graphs>, making SQLite and Postgres much more viable options even for network-processing workloads. So I wouldn’t be surprised to learn that people actually use pgRouting and Supabase in that setting.

With the rise of Postgres-compatible I’m wondering if it’s worth refreshing the project. Similarly, there are now more Graph DBs like MemGraph compatible with CYPHER, which should probably work much better than Neo4J.

BlooIt

Checkout https://github.com/Pometry/Raphtory, it's written in Rust, embedded (the binaries are about 20mb) and you can use the Python APIs as a drop-in replacement for NetworkX. Disclaimer, I am one of the people behind it.

pjd7

This looks super interesting.

Just starting to review it but my front of mind questions: 1) How do I handle persistence? Looks like some code is missing. 2) Do you support multi-tenancy (b2b saas graph backend for handling relations scoped to a tenant)

Thanks

BlooIt

Good questions.

1) You can persist a graph to disk. By default, this uses protobuf (`save_to_file`), however we’re migrating to Parquet in next release for better performance because we noticed loading a 100m edge graph from scratch (CSV, Pandas, or raw Parquet) is actually faster (~1M rows/sec) than from persisted proto, which isn’t ideal. There’s also a private version that uses custom memory buffers for on-disk storage, handling updates and compaction automatically.

2) You can run a Raphtory instance either as a GraphQL server or an embedded library. For the server, multiple users can query the persisted graphs, which are stored in a simple folder structure with namespaces (for different graphs). For now, access control needs to be managed externally, however it's on our roadmap!

gkorland

Your graph DB frustrations mirror what many experienced with Neo4j. If you refresh your project, consider including FalkorDB (formerly RedisGraph) - it uses sparse adjacency matrices and GraphBLAS for much better performance while supporting Cypher.

Would be interesting to see updated benchmarks comparing these newer options against PostgreSQL extensions.

threeseed

You ran Neo4J with 512MB even thought it has always recommended 2GB at a minimum.

And MemGraph is nice but it's memory only where as Neo4J is designed for super large graphs that live on the filesystem. Not really that comparable.

henryfjordan

I had almost exactly the opposite experience, although my dataset was pretty small.

We wanted to store a graph in postgres and ended up writing some recursive queries to pull subgraphs then had NetworkX layered over it to do some more complex graph operations. We ended up doing that for a short while but then switched to Neo4j because of how comparatively easy it was to write queries (although the Python support for Neo4j was severely lacking). Never really stressed it out on dataset size though.

I did manage to crash Redis' graph plugin pretty quickly when I was testing that.

SahAssar

Not sure what you consider "quite small" and I don't know how NetworkX works, but postgresql recursive queries have worked well for me for small graphs.

Could you share what the data structure and scale was?

henryfjordan

We basically had a single table that we wanted to be able to nest on itself arbitrarily. Think categories and subcategories, maybe 100k nodes/rows

Postgres worked fine but cypher is so much more expressive and handles stuff like loop detection for you, neo4j was much easier to work with. Performance wasn't ever really an issue with either.

philjohn

A good 10 years ago or so I was running a solution that used RDF Quad Stores - and the best one at the time (after trialling 4Store, Marklogic and some others I can't remember) was OpenLink Virtuoso - how they managed to fit a performant distributed Quad store into what started life as an SQL engine was impressive.

I've left that world now, but if you're in the market for a graph store again, it might be something to look at.

wslh

Have you tried NetworkDisk[1] to manipulate NetworkX graphs on disk?

[1] https://networkdisk.inria.fr/

nemo44x

Neo4j is pretty bad and very dated. No idea what they’re doing. MemGraph is a much better tool.

Really graph is a feature and not a product.

threeseed

Neo4J is mature not dated which is why it's so popular.

And couldn't disagree more that graph is a feature. You really want something optimised for it (query language / storage approach) as the data structure is so different in every way from a relational or document store.

thesz

> ...as the data structure is so different in every way from a relational or document store.

No, it is not.

[1] https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...

Graph processing can create substantial amount of intermediate data if it is done in typical join implementation fashion (nested loops or hash join). So it may appear that graph processing needs a tailored approach.

But what can help graph algorithms can help SQL query execution as well and vice versa, see the link above.

For example, TPC-DS contains queries that (indirectly) joins same tables multiple times (query 4, for example). This is, basically, a kind of centrality metric computation for a graph represented by the tables.

tucnak

How is it different? Isn't a graph basically two sets of tuples: edges and nodes? I played with Cayley (Google) for a little while, & that was my impression.

kiwicopple

My original goal in this article was to figure out if pgrouting would be a good tool to build a memory-layer (for AI/agents) but the article got a bit long. Early results are promising - I’ll follow up with another article soon

there are some other interesting extensions in this space - onesparse[0] is early in development but pretty exciting as it builds on SuiteSparse which is very mature

[0] https://onesparse.com/docs.html

michelpp

Thanks Paul! OneSparse author here, we're still in early dev stages (OneSparse requires some new features in postgres that won't be available until pg18) but my plan is to do a benchmark shootout with various graph tool for postgres. Initial results look good, on my 4-core 11th gen intel laptop we're getting some really good numbers!

             LiveJournal                 Orkut
  Nodes:     3,997,962                   3,072,441
  Edges:     34,681,185                  117,185,037
  Triangles: 177,820,130                 627,583,972

                  Seconds  Edges/Second  Seconds  Edges/Second
  Tri Count LL:   2.69     12,892,634    32.03    3,658,602
  Tri Count LU:   1.78     19,483,812    16.38    7,156,338
  Tri Centrality: 1.45     23,918,059    12.22    9,589,610
  Page Rank:      7.12     4,870,953     23.14    5,064,176

Orkut was as big as I could go due to limited RAM. One of my constrains is limited access to big enough hardware to do the kinds of Graphs Of Unusual Size (billions of edges, trillions of triangles) where we can really flex the scale that CUDA support gives us. Stay tuned!

szvsw

Supabase consistently puts out such fantastic bite-sized gems - and for me some of my favorites have been related to PostGIS - whether it’s about serving tiles directly, or this (ab)use of functionality typically used in a PG geospatial context. Nothing revolutionary or massive and complex - not like reading DDIA of course, but just fun and mentally activating, making me want to jump into something new. I really applaud them for frequently posting actually engaging content that just gets you excited to work with databases… it sounds silly to say it like that, but it does feel like I get regularly struck with the feeling of sadness when I realize how vanilla all of my daily development related interactions with dbs are so vanilla.

nrjames

I’ve always wondered why there isn’t a “SQLite for graphs,” so to speak. Is there something about how they have to be stored that precludes an in-process solution with disk-based storage?

ryangs

https://www.hillelwayne.com/post/graph-types/ gives an interesting take on why we don't see a graph type as a primitive on more programming languages. Essentially boils down to graphs being very vague and depending on the topology of your graphs you are going to want different implementations for reasonable efficiency.

That said, there are graph databases.

nrjames

Interesting! Thanks for the link. I suppose the graph databases just take an opinionated approach. NetworkX is great; I always wished it had a simple backend.

canadiantim

There is now, it's Kuzudb, embedded as well.

holtwork

I'm working on a little Poatgres graph db project. The querying and table structure is much simpler for the same task:

https://memelang.net/03/ https://github.com/memelang-net/memesql3

pjd7

Interesting in hearing some thoughts about using roaring bitmaps stored in a bytea postgres column to represent adjacency matrixes.

I was thinking that given RDS has support for plrust and PostgreSQL's SPI I could use the fact they support croaring-rs there as a crate and build upon that.

I figure I can use that to represent many graph's with say 100s to ~100m nodes and many relations between these things. But each graph would be tenanted to a tenant (company/b2b saas use case).

I was thinking that by using plrust storing the roaring bitmap on the DB server in a bytea and using SPI, I can benefit from the minimal network overhead to mutate and query against the bitmap with croaring. Using SPI locally in the DB server I eliminate network overhead shipping that back to my application code.

PostgreSQL also gives me transaction safety to updates etc. And a bunch of support for other column base data such as my tenant ID column, some JSONB for relationship metadata to query on etc.

Basically something like https://jazco.dev/2024/04/20/roaring-bitmaps/ but on postgres. Given I need to support many tenanted graphs & we're already using citus this seems like something that is feasible at a larger scale too.

I was wondering though if I am going to need to create some operator classes to allow me to index relations a bit better (probably seems likely I think).

I am aware of https://github.com/ChenHuajun/pg_roaringbitmap but would prefer to use int64s and maybe start out on RDS instead of having to add another workload to our self hosted citus cluster/s.

Happy to be told I am fool and any insights would be nice. I am potentially going to try this out on some of our data sets we have because our product team is basically laying out a vision where they want us to have a graph powering a bunch of things.

I don't like the idea of neo4j when we're already deep into PostgreSQL for a bunch of workloads (~20+ TB table workloads etc so we have some reasonable inhouse PG experience).

Also huge thanks to the author of the blog post. I had been looking at pgRouting and wondering with a tilted head.. hmm seems like we can just use this as a graph DB. So that is also on my list to test out.

vincnetas

Any comments on "Apache AGE"?

Apache AGE™ is a PostgreSQL that provides graph database functionality.

https://age.apache.org

demaga

I think Apache AGE is much more generic, as it can parse Cypher queries and comes with a bunch of utility functions.

OP article is more like a hack, and a good one! It seems like you can achieve a lot of what you might expect from graph database with pgRouting functions and good old SQL.

karussell

Maybe a stupid question. When just looking at the data model (and not e.g. the query language) ... but is there a real difference between a "graph" database and a "normal SQL" database when the SQL database is able to directly point to rows (ROWID?) without a separate index?

Nelkins

Anybody have any experience creating isocrhones using PgRouting? I have a use case that involves generating isochrone maps for walking, biking, etc. but I'd like to just use Postgres if possible and avoid another piece of infrastructure like Valhalla, OpenTripPlanner, OpenRouteService, etc.

rubenvanwyk

You can always trust Postgres to have another extension that opens up great new data modelling opportunities. This is great. Wonder how this stacks up to the CedarDB (Postgres-compatible) graph capabilities.

kirmerzlikin

Great article! Although it seems that the last section (about YouTube recommendations) is incomplete, as there's no query to actually calculate the recommendations.

kiwicopple

I’ll see if I can get around to adding the SQL this week

whalesalad

Has anyone done something like this with a very large dataset? Hundreds of millions of rows.

lmeyerov

For sub-billion graph queries, we have been doing in-memory via GFQL that runs graph queries & analytics on top of pandas (columnar CPU dataframes) and cudf (columnar GPU dataframes). Sub-billion is typically small enough that single node is fine. We began dev after years of being annoyed at having no good simple OSS in-process solution for small graphs like these. Avoiding jumping architectural hoops around dealing with multiple systems of record infra is nice, as can often just do in the compute tier.

Once big graphs get involved, scalable systems, and especially those that seperate storage from compute & price accordingly, get much more interesting. We work with our partners like databricks, Google spanner, AWS Neptune, etc, who have different sweet spots that really depend on workload and context, they're all pretty different. OLTP vs OLAP, etc.

HN

Postgres as a Graph Database: (Ab)Using PgRouting

Postgres as a Graph Database: (Ab)Using PgRouting