How NASA Is Using Graph Technology and LLMs to Build a People Knowledge Graph

behnamoh

> What made you choose memgraph? ... And then Memgraph showed me the cost. That kind of sold me for time for us to be able to do that.

It's an ad post about memgraph.

ctxc

Yes, domain is memgraph and it seems to be a marketing case study.

inerte

I know it’s a marketing case study, but:

> Ever wondered how NASA identifies its top experts, forms high-performing teams, and plans for the skills of tomorrow?

Here’s another resource on that https://appel.nasa.gov/2010/02/18/aa_2-7_f_nasa_teams-html/ the book “How NASA Builds Teams: Mission Critical Soft Skills for Scientists, Engineers, and Project Teams”

jerryseff

Memgraph is laughably expensive - I honestly wonder what anyone actually uses it for outside of companies that just don't care about infra spend.

mbuda

DISCLAIMER: The co-founder and CTO of Memgraph here.

To add more context, Memgraph Enterprise pricing is explained under https://memgraph.com/pricing: "Starting at $25,000 per year for 16 GB, Memgraph has an all-inclusive, simple pricing model that scales with your workload without restrictions. No charge for compute. No charge for replicas. No charge for algorithms. No Surprises.".

In addition, Memgraph Community is free (standard BSL license, which turns into Apache2 4 years after release date, https://github.com/memgraph/memgraph/blob/master/licenses/BS...), and it has many features that are usually considered enterprise (users, replication, not a single degradation in performance or scale, etc.).

Please elaborate more about why the pricing seems expensive, or put it into the infra-cost perspective :pray:

smarx007

I think on this site anything that's more expensive than free is considered expensive. Countless arguments have been had on Oracle vs Postgres, including lock-in. I think lock-in is more important to consider than license cost.

To be fair, it is quite nice for the pricing to be transparent. And I think it's somewhat competitive w.r.t. Stardog, for example. The community version is less restricted than Ontotext, for example.

smarx007

If you want a production-grade graph DBMS, you don't have that many OSS options that are reliable and well-supported.

In the relational space, it took OSS options like Postgres many decades (and somehow paid-for person-years) to get to a place where enterprises seriously consider migrating off Oracle to it.

jcgrillo

Are there any? My experience so far with graph databases is a resounding failure.

XorNot

In OSS or generally?

thumbsup-_-

Seems like a very simple use-case given that it will be barely used at scale. A few thousand employee entries and read qps a few 10s? What’s so special about it to post

gitroom

Man, love seeing pushback on automated skill matchingsometimes feels like tech folks keep inventing new tools just to dodge actual conversations. Ever wonder if all this automation just makes things colder instead of smarter?

jandrewrogers

> The current graph has about 27K nodes and 230K edges

That is tiny even by historical standards. I was expecting there to be some type of technology here. Why is this interesting?

smarx007

> "To make sure everyone understands that, I prefer label property graphs over RDF."

I have two major issues with virtually all graph DBMSs that are not RDF/SPARQL-based:

1) They do not allow structure-preserving querying. That is, I query a graph and want the results to be a smaller graph. This is trivial in SQL, you just 'SELECT * FROM x WHERE ...' and the result set you get is tabular just like the table x. In SPARQL, there are a CONSTRUCT/DESCRIBE queries that do just that - give you the results as a graph.

2) They don't use any (internationally recognized) standard to represent graph data. RDF is the only such format known to me (ignore all the semantic web stuff associated with it and just consider the format).

230k edges is peanuts for a graph db. It's like when the number of rows times columns in your SQL DB is 230k. NASA could (should?) have just used Oxigraph, RDF4J, or Jena. Stardog and Ontotext are the paid options. However, it is quite nice to see more interest in graph-based DBMSs in general!

> “Which employees have cross-disciplinary expertise in AI/ML?”

Regarding the study itself, I did not understand who is the target user of this. I would rather be more interested in the Lessons Learned 2.0 study (I understand it was attempted once before [1]). I don't think the study at hand would be able to correctly answer questions about expertise.

On the technical side, as far as I understand, the cosine similarity was computed per triplet? In that case, I could see how pgvector could be used for this. Relevance expansion is the only thing in the article that made me think that it would be cool if it works well. But I could see how in a combo of a regular RDF DBMS + pgvector, one could first do a cosine similarity query via pgvector and then compute an (S)CBD [2] of the subject (the from node) of the triplet.

[1]: https://youtu.be/QEBVoultYJg?t=1653

[2]: https://patterns.dataincubator.org/book/bounded-description....

demaga

> 27K nodes and 230K edges

This is such an overkill for that kind of data. Even if they do plan to "scale up significantly", I doubt that they'll actually experience any benefit of graph db.

mmooss

Why do you say that?

dpflan

As an alternative to a pure graph db (e.g. here, memgraph), has anyone here used Apache's AGE graph-database extension for Postgresql? For making a knowledge graph that can live alongside SQL?

dgllghr

I believe AGE has unfortunately been defunded: https://github.com/apache/age/discussions/2150 It’s a shame because it seemed like being able to query data across multiple paradigms would be really useful

dcreater

Talks extensively about the details of the thing.

But doesn't actually show the thing.

That's AI hypecycle signal for probably bullshit/defective thing.

citizenpaul

My experience with tools like this is that they have only one single outcome. Piling work onto the most talented or desperate(ie need money or visa) people until they leave the org/company. Eventually leading to total skill erosion and a very low average skill/productivity across the company as people leave or hide their abilities.

Why? because there is never a reward attached. Oh you want to make me the AI resource for the agency but not remove former duties or increase my pay? Ummmm no thanks. Also things tend to happen in waves ie "AI" so everyone needs a lot from a very few people at the same time. No one ever asks how those people can be empowered. Just how can we put the screws to them so they work harder.

HR and Mgmt can f-off with their "skill resource bank" or whatever nonsense they call it this year. My skills are what I was hired for on the job description. If you want to discuss a new position or higher pay for different skills I'm very happy to talk about how I can work with the org to make that happen. Thats never the case though.

cebert

I think I found a place Dodge can save some money. Memgraph pricing is ridiculous.

patcon

Even paying a college grad to babysit a server costs more than their yearly rate. I assume you're speaking as someone who loves to host everything for themselves, but the logic is surely different in enterprise/government, no?

cebert

It depends on your usage models, but if you compare it to AWS Neptune, the pricing seems quite high. I doubt NASA is running queries 24x7 for this use case so other options could be less expensive.

HN

How NASA Is Using Graph Technology and LLMs to Build a People Knowledge Graph

How NASA Is Using Graph Technology and LLMs to Build a People Knowledge Graph