Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL

66 comments

·July 15, 2025

rainingmonkey

What a fascinating intersection of technology and human psychology!

"One thing I noticed toward the end is that, even though the robot remained expressive, it started feeling less alive. Early on, its motions surprised me: I had to interpret them, infer intent. But as I internalized how it worked, the prediction error faded Expressiveness is about communicating internal state. But perceived aliveness depends on something else: unpredictability, a certain opacity. This makes sense: living systems track a messy, high-dimensional world. Shoggoth Mini doesn’t.

This raises a question: do we actually want to build robots that feel alive? Or is there a threshold, somewhere past expressiveness, where the system becomes too agentic, too unpredictable to stay comfortable around humans?"

floren

Furbies spring to mind... They were a similar shape and size and even had two goggling eyes, but with waggling ears instead of a tentacle.

They'd impress you initially but after some experimentation you'd realize they had a basic set of behaviors that were triggered off a combination of simple external stimuli and internal state. (this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????")

ben_w

To quote, "if the human brain were so simple that we could understand it, we would be so simple that we couldn’t".

So…

> this is the part where somebody stumbles in to say "dOn'T hUmAnS dO ThE sAmE tHiNg????"

…yes, but also no.

Humans will always seem mysterious to other humans, because we're too complex to be modelled by each other. Basic set of behaviours or not.

tweetle_beetle

This ground breaking research pushed the limit of human-Furby interactions and interfaces https://www.youtube.com/watch?v=GYLBjScgb7o

oniony

And we should all chip in together to buy that somebody a new keyboard.

Sharlin

People have always been ascribing agency and sapience to things, from fire and flowing water in shamanistic religions, to early automatons that astonished people in the 18th century, to the original rudimentary chatbots, to ChatGPT, to – more or less literally – many other machines that may seem to have a "temperament" at times.

Bluestein

ChatGPT is the new golem.-

ben_w

Robots put the "go" into "golem".

I'd say ChatGPT is more like the eponymous Sorcerer's Apprentice: just smart enough to cause problems.

anotherjesse

This feels similar to not finding a game fun once I understand the underly system that generates it. The magic is lessened (even if applying simple rules can generate complex outcomes, it feels determined)

parpfish

Once you discover any minmaxxing strategy, games change from “explore this world and use your imagination to decide what to do” to “apply this rule or make peace with knowing that you are suboptimal”

anyfoo

It's often a bit of a choice, though. You definitely can minmax Civilization, Minecraft, or Crusader Kings III. But then you lose out on the creativity and/or role-playing aspect.

In Minecraft, I personally want to progress in a "natural" (within the confines of the game) way, and build fun things I like. I don't want to speedrun to a diamond armor or whatever.

In Crusader Kings, I actually try to take decisions based on what the character's traits tell me, plus a little bit of own characterization I make up in my head.

dmonitor

a poorly designed game makes applying the rules boring. a fun game makes applying the rules interesting.

moron4hire

I've noticed the same thing with voice assistants and constructed languages.

I always set voice assistants to a British accent. It gives enough of a "not from around here" change to the voice that it sounds much more believable to me. I'm sure it's not as believable to an actual British person. But it works for me.

As for conlangs: many years ago, I worked on a game where one of the goals was to have the NPCs dynamically generate dialog. I spent quite a bit of time trying to generate realistic English and despared that it was just never very believable (I was young, I didn't have a good understanding of what was and wasn't possible).

At some point, I don't remember exactly why, I switched to having the NPCs speak a fictional language. It became a puzzle in the game to have to learn this language. But once you did (and it wasn't hard, they couldn't say very many things), it made the characters feel much more believable. Obviously, the whole run-around was just an avoidance of the Uncanny Valley, where the effort of translation distracted you from the fact that it was all constructed. Though now I'm wondering if enough exposure to the game and its language would eventually make you very fluent in it and you would then start noticing it was a construct.

ben_w

> I'm sure it's not as believable to an actual British person.

FWIW: As a British person, most of TTS British voices I've tested sound like an American trying to put on something approximating one specific regional accent only to then accidentally drift between the accents of several other regions.

dylan604

"ah, you hesitated" no more so than on every single other question.

the delay for the GPT to process a response is very unnerving. I find it worse than when the news is interviewing a remote site with a delay between responses. maybe if the eyes had LEDs to indicate activity rather than it just sitting there??? waiting for a GPT to do its thing is always going to force a delay especially when pushing the request to the cloud for a response.

also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic

jszymborski

I wonder how well suited some of the smaller LLMs like Qwen 0.6B would be suited to this... it doesn't sound like a super complicated task.

I also feel like you can train a model on this task by using the zero-shot performance of larger models to create a dataset, making something very zippy.

accrual

I wondered similar. Perhaps a local model cached in a 16GB or 24GB graphics card would perform well too. It would have to be a quantized/distilled model, but maybe sufficient, especially with some additional training as you mentioned.

jszymborski

If Qwen 0.6B is suitable, then it could fit in 576MB of VRAM[0].

https://huggingface.co/unsloth/Qwen3-0.6B-unsloth-bnb-4bit

otabdeveloper4

16Gb is way overkill for this.

accrual

> also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic

This seems like a good place to leverage a wake word library, perhaps openWakeWord or porcupine. Then the user could wake the device before sending the prompt off to an endpoint.

It could even have a resting or snoozing animation, then have it perk up when the wake word triggers. Eerie to view, I'm sure...

https://github.com/dscripka/openWakeWord

https://github.com/Picovoice/porcupine

datameta

This also saves energy to the point of enabling this device to be wireless.

surfandshow

[dead]

justusthane

> the delay for the GPT to process a response is very unnerving

I'm not sure I agree. The way the tentacle stops moving and shoots upright when you start talking to it gives me the intuitive impression that it's paying attention and thinking. Pretty cute!

dylan604

it's the "thinking" frozen state while it uploads and waits for a GPT response that is unnerving. if the eyes did something to indicate progress is being made, then it would remove the desire to ask it if it is working or something. the last thing I want to be is that PM asking for a status update, but some indication it was actually processing the request would be ideal. even if there was a new animation with the tail like having it spinning or twirling like the ubiquitous spinner to show that something is happening

the snap to attention is a good example of it showing you feedback. the frozen state makes me wonder if it is doing anything or not

lsaferite

Back when Anki (the robotics company) was building Cosmo, a *lot* of thought was put into making it expressive about everything that was going on. It really did a good job of making it feel "alive" for lack of a better word.

tetha

It clearly needs eyebrows like Johnny 5.

https://www.youtube.com/watch?v=l0zmCUVB0Yw

micromacrofoot

beyond the prototyping phase, which hosted models make very easy, there's little reason this couldn't use a very small optimized model on device... it would be significantly faster/safer in an end product (but significantly less flexible for prototyping)

SequoiaHope

This is adorable! I did some research on tentacle robots last year. The official term is “continuum robots” and there’s actually a great deal of research into their development due to their usefulness in medical robotics. This lecture is a great overview for the curious: https://youtu.be/4ktr10H04ak

typs

This is so sick. I agree that it’s a little lame that we have all these AI capabilities right now, robotics improving, and all we can think of making is humanoid robots. Like I want a spider/squid hybrid robot running around my house

mrcwinn

All this concern about AI safety, and this nice person wants a spider-squid hybrid robot running around!

tsunamifury

We are looking to make robotics most compatible with a humanoid world.

That being said he makes some points that alternate limb types could be interesting as well

sparrish

Hell no! I seen this movie and I don't want any face-hugger sitting on my desk.

ceejayoz

Hentai enthusiasts, on the other hand...

0xEF

I was about to say, I think we all know where this is going...

dylan604

Hey, what are you watching?

I swear it's work related. You should see the other training data I had to use

dunefox

A Lovecraft reference, nice. I'm wondering whether a smaller model would suffice as well.

zkms

https://knowyourmeme.com/memes/shoggoth-with-smiley-face-art... https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-...

troyvit

Yeah I came here to say the same thing. It seems like it would simplify things. They do say:

"I initially considered training a single end-to-end VLA model. [...] A cable-driven soft robot is different: the same tip position can correspond to many cable length combinations. This unpredictability makes demonstration-based approaches difficult to scale.[...] Instead, I went with a cascaded design: specialized vision feeding lightweight controllers, leaving room to expand into more advanced learned behaviors later."

I still think circling back to smaller models would be awesome. With some upgrades you might get a locally hosted model on there, but I'd be sure to keep that inside a pentagram so it doesn't summon a Great One.

joshuabaker2

I was surprised it pinged gpt-4o. I was expecting it to use something like https://github.com/apple/ml-fastvlm (obviously cost may have been a factor there), but I can see how the direction he chose would make it more capable of doing more complex behaviours in the future w.r.t adding additional tentacles for movement and so on.

dvngnt_

I've seen enough media from Japan to know where this is heading

huevosabio

This is so cool! I love the idea of adding expressivity to non verbal, non human entities.

accrual

Agreed! I think the Pixar lamp is a great starting point. Having the robot be able to flex and bend, shake yes/no, look curious or upset, and perhaps even let it control LEDs to express itself.

weikju

I’ve seen this from some Apple research lab recently…

https://www.youtube.com/watch?v=g3jgCxnlbFY

lmz

That is the lamp being referenced in the article.

therealbilliam

I am both super impressed and creeped out

ge96

Get 4, Doc Oc

Also was thinking of Oogie Boogie Tim Burton

zhyder

Beautiful work! I appreciate how this robot clearly does NOT try to look like any natural creature. I don't want a future where we can't easily distinguish nature from robotics. So far humanoid robots look clearly robotic too: hope that trend continues.

regularfry

I seem to remember that the SpiRobs paper behind the (extremely neat) tentacle mechanism indicated that they were going for a patent.

lukeinator42

If it's described in a paper doesn't that make it prior art though?

blamestross

Not if it is the authors of the paper filing for the patent. Otherwise people would never publish papers.

jameshart

Patents are intended to be the form of first public disclosure of an idea. Disclosing it before patenting it can prevent the patent application being valid.

US has a 1 year grace period. In most countries, any public disclosure makes an idea unpatentable.

https://outlierpatentattorneys.com/patent-public-disclosure

varispeed

This always grinds my gears. For some people "discoveries" are so obvious, they don't bother writing a paper let alone patenting it. Then someone goes and patents it...