Show HN: Semantic Calculator (king-man+woman=?)
105 comments
·May 14, 2025godelski
data + plural = number
data - plural = research
king - crown = (didn't work... crown gets circled in red)
king - princess = emperor
king - queen = kingdom
queen - king = worker
king + queen = queen + king = kingdom
boy + age = (didn't work... boy gets circled in red)
man - age = woman
woman - age = newswoman
woman + age = adult female body (tied with man)
girl + age = female child
girl + old = female child
The other suggestions are pretty similar to the results I got in most cases. But I think this helps illustrate the curse of dimensionality (i.e. distances are ill-defined in high dimensional spaces). This is still quite an unsolved problem and seems a pretty critical one to resolve that doesn't get enough attention.n2d4
For fun, I pasted these into ChatGPT o4-mini-high and asked it for an opinion:
data + plural = datasets
data - plural = datum
king - crown = ruler
king - princess = man
king - queen = prince
queen - king = woman
king + queen = royalty
boy + age = man
man - age = boy
woman - age = girl
woman + age = elderly woman
girl + age = woman
girl + old = grandmother
The results are surprisingly good, I don't think I could've done better as a human. But keep in mind that this doesn't do embedding math like OP! Although it does show how generic LLMs can solve some tasks better than traditional NLP.The prompt I used:
> Remember those "semantic calculators" with AI embeddings? Like "king - man + woman = queen"? Pretend you're a semantic calculator, and give me the results for the following:
franga2000
This is an LLM approximating a semantic calculator, based solely on trained-in knowledge of what that is and probably a good amount of sample output, yet somehow beating the results of a "real" semantic calculator. That's crazy!
The more I think about it the less surprised I am, but my initial thoughts were quite simply "now way" - surely an approximation of an NLP model made by another NLP model can't beat the original, but the LLM training process (and data volume) is just so much more powerful I guess...
CamperBob2
This is basically the whole idea behind the transformer. Attention is much more powerful than embedding alone.
nbardy
I hate to be pedantic, but the llm is definitely doing embedding math. In fact that’s all it does.
refulgentis
...welcome to ChatGPT, everyone! If you've been asleep since...2022?
(some might say all an LLM does is embeddings :)
mathgradthrow
Distance is extremely well defined in high dimensional spaces. That isn't the problem.
Affric
Yeah I did similar tests and got similar results.
Curious tool but not what I would call accurate.
gweinberg
I got a bunch of red stuff also. I imagine the author cached embeddings for some words but not really all that many to save on credits. I gave it mermaid - woman and got merman, but when I tried to give it boar + woman - man or ram + woman - man, it turns out it has never heard of rams or boars.
thatguysaguy
Can you elaborate on what the unsolved problem you're referring to is?
montebicyclelo
> king-man+woman=queen
Is the famous example everyone uses when talking about word vectors, but is it actually just very cherry picked?
I.e. are there a great number of other "meaningful" examples like this, or actually the majority of the time you end up with some kind of vaguely tangentially related word when adding and subtracting word vectors.
(Which seems to be what this tool is helping to illustrate, having briefly played with it, and looked at the other comments here.)
(Btw, not saying wordvecs / embeddings aren't extremely useful, just talking about this simplistic arithmetic)
jbjbjbjb
Well when it works out it is quite satisfying
India - Asia + Europe = Italy
Japan - Asia + Europe = Netherlands
China - Asia + Europe = Soviet-Union
Russia - Asia + Europe = European Russia
calculation + machine = computer
bee_rider
Hmm, well I got
cherry - picker = blackwood
if that helps.gregschlom
Also, as I just learned the other day, the result was never equal, just close to "queen" in the vector space.
Retr0id
I think it's slightly uncommon for the vectors to "line up" just right, but here are a few I tried:
actor - man + woman = actress
garden + person = gardener
rat - sewer + tree = squirrel
toe - leg + arm = digit
raddan
> is it actually just very cherry picked?
100%
groby_b
I think it's worth keeping in mind that word2vec was specifically trained on semantic similarity. Most embedding APIs don't really give a lick about the semantic space
And, worse, most latent spaces are decidedly non-linear. And so arithmetic loses a lot of its meaning. (IIRC word2vec mostly avoided nonlinearity except for the loss function). Yes, the distance metric sort-of survives, but addition/multiplication are meaningless.
(This is also the reason choosing your embedding model is a hard-to-reverse technical decision - you can't just transform existing embeddings into a different latent space. A change means "reembed all")
spindump8930
First off, this interface is very nice and a pleasure to use, congrats!
Are you using word2vec for these, or embeddings from another model?
I also wanted to add some flavor since it looks like many folks in this thread haven't seen something like this - it's been known since 2013 that we can do this (but it's great to remind folks especially with all the "modern" interest in NLP).
It's also known (in some circles!) that a lot of these vector arithmetic things need some tricks to really shine. For example, excluding the words already present in the query[1]. Others in this thread seem surprised at some of the biases present - there's also a long history of work on that [2,3].
[1] https://blog.esciencecenter.nl/king-man-woman-king-9a7fd2935...
nxa
Thank you! I actually had a hard time finding prior work on this, so I appreciate the references.
The dictionary is based on https://wordnet.princeton.edu/, no word2vec. It's just a plain lookup among precomputed embeddings (with mxbai-embed-large). And yes, I'm excluding words that are present in the query because.
It would be interesting to see how other models perform. I tried one (forgot the name) that was focused on coding, and it didn't perform nearly as well (in terms of human joy from the results).
kaycebasques
(Question for anyone) how could I go about replicating this with Gemini Embedding? Generate and store an embedding for every word in the dictionary?
nxa
Yes, that's pretty much what it is. Watch out for homographs.
GrantMoyer
These are pretty good results. I messed around with a dumber and more naive version of this a few years ago[1], and it wasn't easy to get sensinble output most of the time.
lcnPylGDnU4H9OF
Some of these make more sense than others (and bookshop is hilarious even if it's only the best answer by a small margin; no shade to bookshop owners).
map - legend = Mercator projection
noodle - wheat = egg noodle
noodle - gluten = tagliatelle
architecture - calculus = architectural style
answer - question = comment
shop - income = bookshop
curry - curry powder = cuisine
rice - grain = chicken and rice
rice + chicken = poultry
milk + cereal = grain
blue - yellow = Fiji
blue - Fiji = orange
blue - Arkansas + Bahamas + Florida - Pluto = Grenada
antidnan
Neat! Reminds me of infinite craft
thaumasiotes
I went to look at infinite craft.
It provides a panel filled with slowly moving dots. Right of the panel, there are objects labeled "water", "fire", "wind", and "earth" that you can instantiate on the panel and drag around. As you drag them, the background dots, if nearby, will grow lines connecting to them. These lines are not persistent.
And that's it. Nothing ever happens, there are no interactions except for the lines that appear while you're holding the mouse down, and while there is notionally a help window listing the controls, the only controls are "select item", "delete item", and "duplicate item". There is also an "about" panel, which contains no information.
n2d4
In the panel, you can drag one of the items (eg. Water) onto another one (eg. Earth), and it will create a new word (eg. Plant). It uses AI, so it goes very deep
thaumasiotes
No, that was the first thing I tried. The only thing that happens is that the two objects will now share their location. There are no interactions.
__MatrixMan__
Here's a challenge: find something to subtract from "hammer" which does not result in a word that has "gun" as a substring. I've been unsuccessful so far.
mrastro
The word "gun" itself seems to work. Package this as a game and you've got a pretty fun game on your hands :)
aniviacat
Gun related stuff works: bullet, holster, barrel
Other stuff that works: key, door, lock, smooth
Some words that result in "flintlock": violence, anger, swing, hit, impact
neom
if I'm allowed only 1 something, I can't find anything either, if I'm allowed a few somethings, "hammer - wine - beer - red - child" will get you there. Guessing given that a gun has a hammer and is also a tool, it's too heavily linked in the small dataset.
Retr0id
Well that's easy, subtract "gun" :P
downboots
Bullet
soxfox42
hammer - red = lock
lightyrs
I don't get it but I'm not sure I'm supposed to.
life + death = mortality
life - death = lifestyle
drug + time = occasion
drug - time = narcotic
art + artist + money = creativity
art + artist - money = muse
happiness + politics = contentment
happiness + art = gladness
happiness + money = joy
happiness + love = joy
bee_rider
Life + death = mortality
is pretty good IMO, it is a nice blend of the concepts in an intuitive manner. I don’t really get drug + time = occasion
But drug - time = narcotic
Is kind of interesting; one definition of narcotic is> a drug (such as opium or morphine) that in moderate doses dulls the senses, relieves pain, and induces profound sleep but in excessive doses causes stupor, coma, or convulsions
https://www.merriam-webster.com/dictionary/narcotic
So we can see some element of losing time in that type of drug. I guess? Maybe I’m anthropomorphizing a bit.
grey-area
Does the system you’re querying ‘get it’? From the answers it doesn’t seem to understand these words or their relations. Once in a while it’ll hit on something that seems to make sense.
galaxyLogic
What about starting with the result and finding set of words that when summed together give that result?
That could be seen as trying to find the true "meaning" of a word.
rdlw
I've always wondered if there's s way to find which vectors are most important in a model like this. The gender vector man-woman or woman-man is the one always used in examples, since English has many gendered terms, but I wonder if it's possible to generate these pairs given the data. Maybe to list all differences of pairs of vectors, and see if there are any clusters. I imagine some grammatical features would show up, like the plurality vector people-person, or the past tense vector walked-walk, but maybe there would be some that are surprisingly common but don't seem to map cleanly to an obvious concept.
Or maybe they would all be completely inscrutable and man-woman would be like the 50th strongest result.
jumploops
This is super neat.
I built a game[0] along similar lines, inspired by infinite craft[1].
The idea is that you combine (or subtract) “elements” until you find the goal element.
I’ve had a lot of fun with it, but it often hits the same generated element. Maybe I should update it to use the second (third, etc.) choice, similar to your tool.
nxa
This might be helpful: I haven't implemented it in the UI, but from the API response you can see what the word definitions are, both for the input and the output. If the output has homographs, likeliness is split per definition, but the UI only shows the best one.
Also, if it gets buried in comments, proper nouns need to be capitalized (Paris-France+Germany).
I am planning on patching up the UI based on your feedback.
I've been playing with embeddings and wanted to try out what results the embedding layer will produce based on just word-by-word input and addition / subtraction, beyond what many videos / papers mention (like the obvious king-man+woman=queen). So I built something that doesn't just give the first answer, but ranks the matches based on distance / cosine symmetry. I polished it a bit so that others can try it out, too.
For now, I only have nouns (and some proper nouns) in the dataset, and pick the most common interpretation among the homographs. Also, it's case sensitive.