Launch HN: Pinch (YC W25) – Video conferencing with immersive translation

25 comments

·February 4, 2025

Hey HN! I'm Christian, and my co-founder Keyu and I are building Pinch (https://startpinch.com), a virtual conferencing platform with translation that mimics your voice and synchronizes your lips in real-time to make you sound and appear as a native speaker in over 15 languages.

Here's a demo: https://youtu.be/Cu7KlbZ3gjw, but you can also try it for free on our website.

Over the last three years, Keyu and I were working in a company where we had to lead engineering and research teams across the U.S., China, India, and Europe. We felt the language barrier actively limiting our team's potential in terms of collaboration + productivity. The existing tools we tried operated in low-bandwidth mediums (mostly text), which 1) means they are slower because they need to convert audio to text before translation, and 2) lose all information of how something was said.

At that point we knew there had to be a better way to connect across different languages and cultures, so we started building Pinch. Shortly after, we found out how challenging translation truly was. Balancing latency/accuracy for chunk-based audio translation, capturing inflection and tonality per statement, handling culturally specific phrasing, and making a seamless meeting experience are all unsolved problems we're taking on.

So far, we've seen some really interesting use-cases (many we hadn't considered!), from personal connections like a first conversation with foreign in-laws, to more business-oriented usage in sales and meeting foreign clients.

After a long experience building conversation AI video/audio, we're incredibly excited to see what these same technologies can unlock for human<>human communication.

You can try a demo or create a meeting for free: https://startpinch.com

All feedback is appreciated, and we'd love to know how we're doing on the overall meeting UX and translation accuracy for your language. Thanks all!

Visit

skeeter2020

>> The system continuously learns and improves from usage while maintaining privacy and security.

Are you training your own translation models? or using third-party services?

>> >> Think real-time translation + natural expressions + perfect lip sync.

not yet, based on the demo.

FlamingMoe

I work with a lot of overseas developers who speak with thick accents and sometimes it can be very difficult to understand them or for them to understand me. I could definitely see this being a more pleasant experience for everyone.

lolpanda

Great idea! The demo looks impressive. What are your thoughts on real-time translated captioning compared to AI voice? I guess it's still difficult to mimic nonverbal elements like laughter and pauses.

tpae

I really like the concept, but I don't understand why you guys are building an entire video conferencing platform. That sounds like years of work building the network and millions of VC funds. It could be a standalone app that exports video to existing conferencing services. I would pay good money for that.

christiansafka

Thanks! We have a virtual camera on our roadmap as well, but by building the conferencing platform end to end we can optimize both latency and conversation UX to a much higher degree. We're also lucky to be building this now and not five years ago - there are some solid webrtc infra companies and open source projects to build on.

null

[deleted]

michaelmior

By export I assume you mean as a virtual webcam? I would definitely prefer that as a user to be able to use any videoconferencing app.

brap

Assuming the tech is solid, I think that if you had developed this as a browser extension to work on top of Meet/Teams/etc, not only would your dev time have been much shorter and adoption much faster, but Google/Microsoft/etc would have probably bought you out in a blink of an eye.

debarshri

I am not sure if the dev time would be shorter because teams and meet have its own nuances as well as you would be limited to what you could be by the tool itself. Also, i dont think you would go into every call with this plugin on.

This is very valuation where communication barrier is high and has specialized usecases in industries like supply chain, outsourcing.

hassleblad23

Getting bought out is not a bad option here.

skylerwiernik

Cool idea, but just watching your demo it looks like it doesn’t work. Is there any change in the video? The lip movements certainly don’t look synchronized, and audio often continues after the person stops talking. It also doesn’t do any audio mimicking. It really doesn’t look like it does anything that Google Translate doesn’t.

christiansafka

Appreciate the feedback. On the video side, we currently synchronize it to play out with the translated audio (as often as possible), matching when you started speaking to the moment the translated audio starts. Mentioned in another comment but we're still working on audio mimicking (voice clone then inflection transfer). Our model does a lot that Google Translate doesn't, even just around translation, such as taking into account who you're talking to in the meeting and the conversation context. + we have to do it much faster, so smaller audio chunks at a time!

alloysmila

Just this morning I told myself I should build something like this. I work in global supply chain and the language barriers are an absolute mess.

christiansafka

It's hard, don't do it :D We have a few supply chain companies trying us out though! Would love to hear more about your experience.

Aspos

Impressive demo! Note that on 01:46 it says "You can speak Korean" in Ukrainian lol.

instagary

Congrats, really cool idea! You should add the demo video to your website in addition to the interactive version.

bongwater_OS

Hey just a heads up the demo on your site is broken (for me). English transcriptions are coming through fine but translations aren't being spoken, despite the output video stuttering for a moment at the time when it should.

christiansafka

We noticed that Swedish isn't currently working properly, but we weren't able to replicate this with any other languages. Please let us know if it's still having issues!

elixirnogood

are you guys using livekit for webrtc? If yes, are you using livekit agents as well?

christiansafka

Yes! LiveKit is great - and we are using livekit agents but had to override a few low-level library components for our use case.

elixirnogood

Do you have any concerns around scaling? I like LikeKit stack, but if not mistaken their agent architecture is based on multiprocessing (one os process per 'session'/'conversation') which doesn't sound very scalable. Btw, great demo, this is a cool technical problem to solve. I've spend a couple of months in this space (using a similar stack) and know for a fact that's not easy.

christiansafka

Thanks, there are certainly a lot of fun and challenging problems to take on in the space. On scaling, the agent architecture isn't limited to one machine, so you can also autoscale your machines. It's essentially python's Celery if you've tried that. It gets more tricky when you require GPUs though!

elixirnogood

the demo doesn't mimic my voice unless I misunderstand 'mimic'

christiansafka

I didn't make this clear enough in the post, but we're still working on voice cloning and inflection transfer. Voice cloning is easier, but to support inflection transfer we have to modality-align an LLM.