Skip to content(if available)orjump to list(if available)

Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

17 comments

·October 29, 2025

I've been working on adding graph database capabilities to SQLite with support for the Cypher query language. As of this week, both CREATE and MATCH operations work with full relationship support.

Here's what it looks like:

    import sqlite3
    conn = sqlite3.connect(":memory:")
    conn.load_extension("./libgraph.so")
    
    conn.execute("CREATE VIRTUAL TABLE graph USING graph()")
    
    # Create a social network
    conn.execute("""SELECT cypher_execute('
        CREATE (alice:Person {name: "Alice", age: 30}),
               (bob:Person {name: "Bob", age: 25}),
               (alice)-[:KNOWS {since: 2020}]->(bob)
    ')""")
    
    # Query the graph with relationship patterns
    conn.execute("""SELECT cypher_execute('
        MATCH (a:Person)-[r:KNOWS]->(b:Person) 
        WHERE a.age > 25 
        RETURN a, r, b
    ')""")
The interesting part was building the complete execution pipeline - lexer, parser, logical planner, physical planner, and an iterator-based executor using the Volcano model. All in C99 with no dependencies beyond SQLite.

What works now: - Full CREATE: nodes, relationships, properties, chained patterns (70/70 openCypher TCK tests) - MATCH with relationship patterns: (a)-[r:TYPE]->(b) with label and type filtering - WHERE clause: property comparisons on nodes (=, >, <, >=, <=, <>) - RETURN: basic projection with JSON serialization - Virtual table integration for mixing SQL and Cypher

Performance: - 340K nodes/sec inserts (consistent to 1M nodes) - 390K edges/sec for relationships - 180K nodes/sec scans with WHERE filtering

Current limitations (alpha): - Only forward relationships (no `<-[r]-` or bidirectional `-[r]-`) - No relationship property filtering in WHERE (e.g., `WHERE r.weight > 5`) - No variable-length paths yet (e.g., `[r*1..3]`) - No aggregations, ORDER BY, property projection in RETURN - Must use double quotes for strings: {name: "Alice"} not {name: 'Alice'}

This is alpha - API may change. But core graph query patterns work! The execution pipeline handles CREATE/MATCH/WHERE/RETURN end-to-end.

Next up: bidirectional relationships, property projection, aggregations. Roadmap targets full Cypher support by Q1 2026.

Built as part of Agentflare AI, but it's standalone and MIT licensed. Would love feedback on what to prioritize.

GitHub: https://github.com/agentflare-ai/sqlite-graph

Happy to answer questions about the implementation!

mentalgear

I like the ambition and the open-source spirit behind your project! Open-source graph databases are fantastic.

That said, I’d encourage you to consider leveraging existing projects rather than starting from scratch. There are already mature, local / in-browser graph databases that could benefit from your skills and vision.

For example:

- Kuzu https://github.com/kuzudb/kuzu: This project had very active development but was recently archived (as of October 10, 2025). Continuiing or forking it could be a game-changer for the community.

- Cozodb https://www.cozodb.org/ It’s very feature-rich and actively seeking contributors. Your expertise could help push it even further.

I do get the appeal of building something from the ground up; it’s incredibly rewarding. But achieving production readiness is seriously challenging and time-consuming. These projects are already years ahead in scope, so contributing to them could accelerate your impact and save you from reinventing the wheel.

gwillen85

Thanks for the suggestions! I'm familiar with both. Different category though - this is a SQLite extension, not a standalone database. The value prop is:

Zero friction - If you're already using SQLite (Python scripts, mobile apps, embedded systems), just .load graph_extension and you have graph capabilities Mix SQL + Cypher - Join your relational tables with graph traversals in the same query Works everywhere SQLite works - Serverless functions, Raspberry Pi, iOS apps, wherever Leverage SQLite's ecosystem - All existing tools, bindings, deployment patterns just work

Kuzu and CozoDB are excellent if you want a dedicated graph database. But if you've already got SQLite (which is everywhere), this lets you add graph features without rearchitecting.

Think of it like SQLite's FTS5 extension for full-text search - you're not competing with Elasticsearch, you're giving SQLite users a lightweight option that fits their existing workflow.

selecsosi

This reminds me of the apache age postgres extension as well. Very cool work

gwillen85

Thanks! As a Postgres user first, I really appreciate that comparison. Apache AGE does great work.

Graph databases are crucial for AI memory, especially paired with vector databases. Graph for relationships, vectors for semantic similarity - particularly powerful for embedded systems and robotics where you need lightweight, on-device reasoning.

ble

What kind of constraints can you add?

Could I put a unique constraint on property `id` of all nodes with label X?

Could I put a constraint that edges of kind A must always go from nodes with label X to nodes with label Y?

What kind of indices can you add?

Will SQLite use them when you do a Cypher query?

Will your Cypher query planner take them into account?

gwillen85

So on constraints:

they're definitely on the roadmap but not in the alpha yet. The syntax like CREATE CONSTRAINT ON (n:Person) ASSERT n.email IS UNIQUE is designed to work, and we've got the error handling infrastructure in place, but the actual enforcement logic is planned for v0.2.0 and beyond. Same goes for relationship constraints - the foundation's there with schema validation, but the complex constraint types are coming in future versions.

Indices are similar - we've got the query planner logic ready to use them, and label indexing is partially implemented as a foundation. Property indexes are definitely planned, with basic support in v0.2.0 and composite indexes following in v0.4.0. The current alpha focuses on getting the core Cypher operations (CREATE/MATCH) solid before layering on the optimization features.

The roadmap shows: v0.2.0 (Q1 2026): Property indexes and basic constraints v0.4.0 (Q3 2026): Advanced indexing (composite, spatial) v1.0.0 (2027): Full constraint support

For now, you can enforce constraints at the application level or use raw SQLite constraints on the backing tables if you need that functionality immediately. The alpha is really about proving the core graph operations work end-to-end before adding the enterprise features.

leetrout

I have an ELI5 question...

So you're doing the planning and execution which results in what? Some direct calls into sqlite that create tables? Under the hood is this using tables in a conventional manner where there are adjacency lists or just edges and vertexes or ... ?

I'm looking at `graphFindEdgesByType` and it says they're done with SQL queries - are you effectively transpiling some of the Cypher or just have routines that build queries as needed?

Thanks!

gwillen85

Great Question!

The storage model is just regular SQLite tables. When you create a graph, it makes two backing tables: my_graph_nodes -- id, labels (JSON array), properties (JSON object) my_graph_edges -- id, source, target, edge_type, properties (JSON object) It's an edge list, not adjacency lists.

Query processing is not transpiling Cypher directly. There's a pipeline: Cypher → AST → Logical Plan → Physical Plan (optimizer) → Iterators → SQL queries The iterators generate SQL on-the-fly to fetch from those backing tables. Basically the Volcano model.

graphFindEdgesByType is Actually deprecated and is a no-op now. The comment says "edge lookups are done via SQL queries." They used to have in-memory structures but moved to just generating SQL like: SELECT e.target, e.id, e.edge_type FROM my_graph_edges e WHERE e.source = 123 AND e.edge_type = 'KNOWS'

So it's "build SQL queries as needed during execution" rather than "transpile the whole Cypher query upfront."

jeffreyajewett

Nothing says weekend project like writing a Cypher planner from scratch in C99. We also recently launched AgentML -> check it out https://github.com/agentflare-ai/agentml (ALSO MIT)

gwillen85

This will also be used in the yet to be released `memlite` which is our first wasm component for AgentML

mentalgear

Interesting, yet the xml syntax feels quite verbose vs JSON for example.

gwillen85

I agree but LLMs are very good at generating XML. Additionally SCXML which AgentML extends has been around and finalized for over 15 years. So generating AgentML works incredibly well.