AI Accent Conversion for call centers
74 comments
·March 29, 2025robviren
nasmorn
In the case of a customer service transaction I don’t really care about someone’s individuality. I just want to understand what they are saying. I once checked into a hotel in London and couldn’t understand the Indian staff for the life of me. How is it better to make them say every sentence three times. Indians are often harder to understand for me than someone who just speaks English badly because their vocabulary might actually be large and varied and they use very foreign to me phrasing. If someone is just a foreigner with little knowledge of English their 1-2k vocabulary will be much easier to understand even if their pronounciation is strange.
robviren
I agree, but business won't leave it there. Will you be as comfortable if people in India or China use helpful technology to clear up your confusing accent if you hoped to learn another language? I have never seen a technology stay in the box we want it to. What I have seen is tech foisted onto society without a second thought as to the impact.
Clear communication is a laudable goal. But I fear what might be lost in the ruthless pursuit of it.
Brian_K_White
"In the case of a customer service transaction I don’t really care about someone’s individuality."
That argument can be applied to practically everything, and so is invalid. It does not serve you or any of us other humans to dehumanize other humans, which btw you are in the set of other humans.
People at work performing a function are still people, not functions.
nasmorn
I am not dehumanizing anyone. They can be whoever they want but if I don’t understand them they are not performing the function. Might still be the most interesting people but I will never get to know since I can literally not communicate
dzhiurgis
Disagree. To work in call center you need to have accent thats clear enough.
This will simply enable more people to work there.
thunky
> I just don't like the subtext of forcing people to sound like some core "good" voice. What will be lost in that?
Would you have a problem with using full translation?
The difference here is that the (accent) translator doesn't have to do as much work.
Also it should work both ways: foreign language/voices can be converted to local Indian languages/accents too.
unsupp0rted
> I am not in love with the idea that the fate of AI is to boil down each person to a lowest common denominator.
Are you kidding, that's the holy grail! When I need customer service I want every interaction to be by the book, zero personality, from them or from me.
The problem is it's not right now: when somebody with a non-Western English accent answers a call to a Western company (Airbnb, your bank, etc) it's pretty much a guarantee that you're about to deal with massive incompetence and apathy. You don't notice this until you live long-term in a homogenous country, where the people answering customer service calls are the same people you see on the bus/train after work. It's like a lightning blast into your brain: "holy cow, customer service actually works here?".
And you might say, "well incompetence and apathy is all call center people who work for American companies are empowered to give you, regardless of the accent or geography".
But that's not true: Filipino/a call center employees are gushing with upbeat script-following good humor, but they never solve your problem.
Whereas "Dave" from the midwest sighs and rolls his eyes before taking every call but he actually figures out the reason you're calling and uses his brain to try to resolve it, in the boundaries he's empowered within.
That's not a popular thing to think or say, but I know I'm not the only one who, after literally hundreds of these interactions, has developed this heuristic.
And you might say, "well 'Dave' from the midwest is by definition higher up the ladder and more empowered to solve your problem, because otherwise you'd be getting someone in the Philippines/India who's only allowed to follow their script".
Sure. But that's why it'd be great if there was one common-denominator: anybody who answers your call has as much ability to resolve issues as anybody else, tries as hard as anybody else, communicates as clearly as anybody else... that's the holy grail.
cyanydeez
I dunno man, Capitalism wants us all to be the same grey goo buying slop, but it's clear people and mammals in aggregate desire to be individualized, and these two forces are in opposition but only one of them is currently taking over the world in authority and power.
delichon
I work in a team with several Indian developers. After around three years of that I still struggle with some of their accents. I'd love to have real-time accent conversion in our daily standups. Or subtitles. The same goes for a thoroughly locally accented colleague who mumbles a lot. I imagine it won't be long before this tech starts showing up in hearing aids.
jwally
Never even though of the implications for AI+AR for mundane seeming applications like hearing aides. Really cool. Even when you remove the flood of johnny-come-lately, "just add AI" marketing - the amount of things that can be enhanced to legitimately make peoples lives better truly does seem staggering.
nehal3m
The prospect of inserting a plausible bullshit generator owned by a tech megacorp between myself and the outside world seems more dystopian than making my life better.
devsda
This is fine for online meetings but for in-person meetings, it is a minefield of interpersonal relationship risks if not used discreetly. Can't predict how the other party reacts.
I have had a mix of colleagues with diverse backgrounds, none of them were particularly challenging to understand except for one native chinese speaker. In the beginning, we have indirectly asked for clarifications but stopped doing that after someone candidly mentioned that there's risk of offending the other person. Now we just send meeting minutes to avoid misunderstandings.
null
dzhiurgis
Or a zoom and slack plugin. I can’t imagine Apple or Google adding this to tho, far too “unethical”.
throwaway48476
Coming to a scam call center soon.
userbinator
Maybe the accent is a "filter", just like misspelling and grammar mistakes in phishing emails.
throwaway48476
If you were trying to educate people about scams filtering out certain accents would be very effective.
fnordian_slip
I don't think that's meant by "filter" in this context. I believe instead that it is a reference to scammers filtering out those people who would likely catch on anyway at some point in the scam, so that they can concentrate on the more naive potential victims. Those victims who ignore things like spelling mistakes are much more likely to ignore other warning signs, too.
yapyap
haha yeah, if that’s the case it’s an unintended filter though cause the Indian or Nigerian scam call stereotype works cause they have a lot of scam call centres there.
trash3
[dead]
3np
Listened to the demos. Did not improve comprehension vs original (all of which were clear enough). Besides now they sound like soulless robots, which is distracting and would at least make me very suspicious if I am speaking to a human at all.
Springtime
It's an interesting use case. They show two examples of the accent modification: Indian and Filipino. In my experience every Filipino call center I've interacted with (from auto redirected customer support numbers) have learned American accents. While Indian redirected calls ime have had their native accent.
hnbad
Phillipine English is apparently exported to much of Asia via Filipino English teachers (as they're native speakers of their variety of English) so this is likely not just about the Phillipines.
Indian English on the other hand is of course about India, Pakistan, Bangladesh, etc.
It's pretty obvious what this product is for and who the target market is. Companies don't outsource their call centers to countries where these varieties of English are spoken by the majority of English speakers because they are looking for the best talents.
eru
> Companies don't outsource their call centers to countries where these varieties of English are spoken by the majority of English speakers because they are looking for the best talents.
Any company anywhere always has to look for the best talent at the best price.
Otherwise, you'd only hire Nobel laureates, and go bankrupt immediately.
hnbad
> Any company anywhere always has to look for the best talent at the best price.
Surely you understand the difference between wanting to hire superstars but then adjusting your targets downward based on what salaries you can offer versus wanting to pay as little as possible for call center agents and then adjusting your targets upward based on what you can get away with. "I want the best I can get but I need to be able to afford it" vs "I want to pay nothing but I need to spend enough to get something that works".
Expectations for customer service tend towards rock bottom for many businesses. It's something they have to provide or their customers will get very upset (or they might even get them into legal trouble) but it only has to be good enough to be serviceable. And for call centers this usually means you end up having people closely follow a script anyway so you're literally just paying people to be human dialog trees. It's a nuisance but you can't get away with not having it, so you want to pay as little as possible.
You're talking about cut-offs for price ranges - that's indeed a given which is why I thought it doesn't bear mentioning. I'm talking about whether you go over the resulting list sorting from lowest to highest ("best of the cheapest") or highest to lowest ("cheapest of the best").
raverbashing
Yeah in the examples, while the demo is good, it is, at the same time, noticeable and there's a different type of distortion
(I know they don't do this, but it would be funny if it converted the customer accent back into the agent's accent)
worthless-trash
Having visited and spent time in .ph. The accident is not "learned" its thr default.
Springtime
I wonder if this is a recent thing or something. I haven't visited the Philippines but IRL haven't heard Filipinos speak with this accent, neither did Filipino friends ~15 years ago (instead sounding more like the unadjusted Filipino example from the website here).
worthless-trash
Oh. Just to be clear I'm comparing its closeness to other primary English speaking countries.
There absolutely is still a local language influence.
I think that the accent that is most closest to is American.
kleiba
Interesting use case, but I can hear lots of audio artifacts in the demo video. This doesn't seem consumer-ready just yet.
Apart from that, it also raises some ethical question. Your voice, the tone and its expression, is probably one of the most important features that define you as a person. I cannot imagine anybody being psyched about their voice being changed in a way that's out of their control.
johnisgood
I can hear the artifacts, too, and I can still tell it is an Indian speaking.
kleiba
In all fairness, I don't think the goal is to hide that completely, just to mask the accent to some degree in order to improve intelligibility for speakers from different accent regions.
dzhiurgis
Makes far more sense to use this on receiving end then
johnisgood
Yeah, for that there is voice cloning anyway.
bcowde
"preserving the speaker’s voice for authenticity" - I knew we'd get to this point, but masking your accent seems like the most inauthentic practice - and, dare I say, downright deceitful.
devonsolomon
Humans are incredibly perceptive, and the soft robotic buzz (as in the demos) will do as much to diminish rapport as a foreign accent.
gertrunde
I'm not sure this is as significant as their marketing folk would have us believe. (I know, shocking).
Accent is rarely the whole issue in dealing with non-native language call centres, it's just one part of the language skills package.
- It's not going to help if the caller isn't very good with the callee's language. - It's not going to help the caller understand the callee any better. - In effect, I feel the only language barrier it's helping to overcome, is the callee's feelings towards the caller - although I'm sure that is of some help.
(It maybe doesn't help their case that their test cases use people with reasonably good english language skills anyway).
johnisgood
It is going to be perfect for Indian scammers, but then again, we already had this where you could use anyone's voice, called voice cloning, voice synthesis.
Scammers already use the relative of the target's voice at times.
topynate
It's not perfect but it does make speech easier to understand, which is the ostensible purpose. There's a slight uncanny valley effect. Latency requirements are very tight, and Indian English is actually timed differently to American or British English, so it's a hard problem.
The company should consider targeting RP instead of General American. In my experience attempting the latter with a slight Asian tinge makes you sound like you're trying to be a WASP, while speaking the former with the same small errors makes you sound like you studied PPE at Oxford.
booleandilemma
Maybe it's because I work with Indians every day, but the people in the videos sounded fine even without the software. They should try it on someone with a really thick accent, like this guy: https://youtu.be/pj705DvCSxg?si=Am1Z2ILqfu9wvu_I
Do they have software for the reverse case? Can it give someone an Indian accent, for example?
The processing made them sound robotic in my opinion. I'm certain enough engineering dollars thrown at the problem and you could make anybody sound like anybody else live.
I am not in love with the idea that the fate of AI is to boil down each person to a lowest common denominator. I'm curious what others think, at what point between totally unprocessed source material and everyone sounding like the same four marketable AI voices does a solution like this become problematic? I get the intention of the product, appreciate the focus on clear communication, but can't help feeling some will take it further and I just don't like the subtext of forcing people to sound like some core "good" voice. What will be lost in that?