GibberLink [AI-AI Communication]

dweekly

It's a little depressing to have reinvented the modem only 10,000 times less efficient.

At the point that two AIs discover that they are talking to each other, wouldn't it almost certainly be true that both could access the Internet and therefore the right thing to do (if more than a few bits of information need to be shared) is to exchange endpoint information and hang up the call to be able to communicate directly?

aprentic

Presumably both of these AIs have additional information beyond what was in their initial training.

The hotel agent probably has a RAG that points at their various customer and inventory databases. The user agent has individualized information about the customer. Both of them have also likely had SFT steps that further differentiate them.

The interesting question is if Gibber-Link lets the AIs do something they couldn't otherwise do with natural languages. Does it lower some error rate? Does it reduce the time it takes to send messages? Does it effectively give the AIs additional vocabulary?

If I had to guess at the internals, they probably took the token encodings and mapped them on to tones. Then it just throws text or audio through the decoding filter and passes it back.

If that's the case, the benefits are probably limited to slightly faster communications (It's essentially a simple, lossless compression) and a slightly lower error rate (beeps are easier to correct in noisy environments).

johnla

That might not be practical/possible for early days but this does seem like a bridge to that natural next step that /u/dweekly is saying which would quickly phase out this Gibberlink protocol.

flemhans

A standardized chime in the beginning of the phone call could serve to alert humans as well as AI agents that the party they are talking to are an AI, eliminating the first part of the conversation.

giancarlostoro

I love it, we are returning back to 64k internet where the phone starts screeching to get you where you need to be.

_blk

Bring back Cap'n Crunch and the rainbow books!

aprentic

I'm not sure we actually want to bring back Jon Draper.

https://en.wikipedia.org/wiki/John_Draper#Allegations_of_sex...

elcritch

The Japanese figured this out already! We just say moshi-moshi when you start a phone call [1]. Originally it was believed ghosts couldn't say moshi-moshi (I can speak; I can speak). Now we can apply it to the ghosts in our machines. Who's gonna tell Openai? ;)

1: https://www.reddit.com/r/etymology/comments/13hc2gw/why_was_...

_blk

Yes, pleaaaase! Just make sure it's audible for most people. Otherwise one trick that works well is insulting the caller. Computers tend to react quite differently from humans. J/k, don't do it (if you're not certain that your insulting a bot.)

hooverd

That would be my queue to hang up. A computer should never talk to you first.

Majromax

> A computer should never talk to you first.

Isn't that exactly what happens with every IVR (phone menu) system?

hooverd

They hopefully don't call you first. I was thinking of Google calling restaurants on your behalf to check for reservations. It's not valuing the other party's time. Unless they have their own systems.

See everyone using their own LLMs to write paragraphs that will never be read and only summarized by an LLM on the other end. We're achieved negative compression.

dingnuts

yes and it is infuriating. I don't think anyone wants more of those. But it will be fun prompt injecting AI agents with my mouth in the coming decades. Beats the old hacks for beating the IVR systems to get to a person

noja

A standardised sound throughout

empath75

Everyone is right that the protocol is the wrong one to use, but there _should_ actually be some formally documented handshake for ai-agents to use to agree on an outside protocol to switch to.

ricktdotorg

you're right -- there SHOULD be!

jarbus

Brilliant, I don't feel this is pracctical, but I love the creativity.

willwade

This reminds of me chirp https://archive.is/HEC29

https://audioxpress.com/news/data-over-sound-pioneer-chirp-a...

hansonkd

Why doesn't it just communicate a unique conversation ID and then use a backchannel like opening up a web connection instead? It is supposing that you are able to make a call but not connect to the internet?

bibimsz

this is using phone only

nimish

This is the equivalent of the Yo app but for """AI"""

This is also high art. This needs to be in MOMA or something.

I love this.

jrh3

Use English—the power of plain text.

bibimsz

time is money

m3kw9

I wonder how well it can listen if there is lots of bg noise in this type sounds

shrubble

What’s the RTTY protocol?

megadata

According to the page they're using https://github.com/ggerganov/ggwave

MrG3D

Soon to be replaced by IEEE P2874.

AKSF_Ackermann

They stepped on every single rake possible, didn't they? 1. Why are you making a phone call in the first place, your agent probably got the number from the internet, just keep using that. 2. If you insist on initiating the conversation over a phone call, why not immediately terminate the call and again, go over the internet once you realize that it is an ai to ai conversation. 3. You did in fact re-invent a modem but worse, the quoted speed on that library is 8-16 bytes/sec, and i would like to point out that the Bell 103 did ~37 bytes/sec, and was released in 1963.

bibimsz

It comes down to the problem statement and what the constraints are. This is solving for using the phone-only scenario, which is perfectly valid.

If you want to address a phone-with-internet-backchannel, that's valid too - but it assumes different problem statement and constraints.

AKSF_Ackermann

Please pay more attention to the point 3 in my original post. To reiterate: their encoding is hilariously bad, and is easily outcompeted by a modem from the 60s.

0xDEFACED

youre missing the forest for the trees. the library this demo is using for audio encoding (ggwave) was not made by the creators of this demo. speed (or lack thereof) aside, having a direct audio<->text encoding is much more computationally efficient than speech<->text generation.

on the subject of the encoding efficiency, the ggwave depo mentions the use of reed-solomon error correction to make transmission more reliable. im struggling to find any info on error correction used by bell 103 or other modems, but if they aren't as robust that could partially explain the discrepancy you're describing

swexbe

Sounds more futuristic than old dial-up sounds though

godelski

I think the most important part is the bitrate. As you said elsewhere: "time is money". Seems like you're not saving that much money

rglullis

4. If you are an agent receiving a call, why not announce it right away?

HN

GibberLink [AI-AI Communication]

GibberLink [AI-AI Communication]