Prime Voice AI: AI speech software

jaredcwhite · on Feb 1, 2023

What? Most of the voices I tried sound really intense, even angry. Very strange emotional flavors for what should be quite neutral text inputs. The laughing was literally ha, ha, ha, ha. Not even remotely a genuine human laugh.

The actual sound quality of the output is impressive (clear treble, no weird artifacts between syllables, etc.), but I just don't understand the weird "edginess" of the speech.

godelski · on Feb 1, 2023

I think their model poorly weights the spatial context in sentences. Humans speak with a little rhythm: cadence. The lack of cadence and emotion puts it in deep in the uncanny valley.

ilaksh · on Feb 1, 2023

They have a dial for stability and one for clarity. The stability one in particular when turned up reduces the expressiveness.

x-desire · on Feb 1, 2023

Yeah, when I tried their demo I was a bit confused too - it's not that impressive.

But their model is really good when it comes to cloning voices from small audio samples. It was discovered by 4chan unfortunately [1]. I have only seen a few clips but all of them were racist, sexist or worse. Not appropriate to link them here, I guess. You can see an official sample on their YouTube channel [2]. However, the voices and conversations I've heard yesterday, other than being disgusting, were high quality, believable and full of emotions. The voices of Obi Wan Kenobi or Joe Biden sounded so genuine that it was creepy. I know there has been tools to deepfake voices for years now, but this is the first time I'm seeing one that sounds so authentic.

[1]: https://www.theverge.com/2023/1/31/23579289/ai-voice-clone-d...

[2]: https://www.youtube.com/watch?v=17_xLsqny9E

Savaaki · on Feb 1, 2023

Thats a pretty high bar, have you tried inputting ha ha ha into literally any other TTS engine? It never produces great results.

hidelooktropic · on Feb 1, 2023

Best I've ever heard. Steep pricing to get only 2hrs a month and only 2,500 characters at a time though. I was about to sign up to use this to read articles to me but that amounts to about 4 articles per month and fed into the generator in parts at a time.

tekni5 · on Feb 1, 2023

The reason why ElevenLabs is so good is not because of the default voices, it's because it's so easy to train new voices. You only need a minute or two of someone speaking and it can mimic the voice pretty well, good enough to fool most people.

However their pricing is completely wrong, should be cheaper and offer more.

singularity2001 · on Feb 1, 2023

I found the pricing surprisingly reasonable (not affiliated)

tekni5 · on Feb 2, 2023

I just noticed they improved the pricing a lot since last time I checked.

Before you had free, 10k a month & $22 a month for 60k, now they added $5 a month for 30k and $22 a month for 100k.

Attribution for free is pointless, as with custom voices in theory it would be hard to detect and enforce anyways.

Zetobal · on Feb 1, 2023

few-shot learning for neural voices is over 3 years old now?

ebeip90 · on Feb 1, 2023

For hobbyist use, is this really any better than macOS' "say" command?

Once you've downloaded the Premium voices (e.g. Zoe) it's just a CLI, no API or hidden bells and whistles.

    $ say -v 'Zoe (Premium)' "This is an example of the Zoe voice for my comment on Hacker News."

You'll have to download the voice ahead of time, but Zoe (public) and Maeve (internal) are both excellent voices.

ridgered4 · on Feb 1, 2023

I don't have a mac, do these voices rely on a cloud component to function or do they run locally? (without internet access is the real test I guess)

infl8ed · on Feb 1, 2023

Neat! Took me a while to work out how to download extra voices so in case I can save anyone else a bit of googling here's a guide: https://support.apple.com/en-gb/guide/mac-help/mchlp2290/mac

ideashower · on Feb 1, 2023

I really dig some of the voices here, they are really well done.

gcanyon · on Feb 1, 2023

Zoe is good, but (to my ear) this is far superior. Zoe is still in the uncanny valley. This feels completely natural.

singularity2001 · on Feb 1, 2023

excellent voice yes but the voices here are superior

pelasaco · on Feb 1, 2023

The voice sounds good. However, I would like to see, if its able to parse and read i.e a PDF file in a good flow. I use (and pay) speechify, in the daily basis, to read through pdf books, for my studies. I see that they still have a lot to improve, but I still couldn't find a better solution. Any suggestion?

specialist · on Feb 1, 2023

I had the same suboptimal experience with Speechify. Did not renew.

There are text extraction utilities for PDFs which reconstitute paragraphs and whatnot. Seems like an obvious thing to do. I suggested it, but didn't hear back.

I imagine that PDF munging skills aren't common and a solo developer doesn't have the bandwidth to be smart about so many different techs.

pelasaco · on Feb 1, 2023

i did the suggestion too, and they provided me a kind of roadmap in this direction - I'm just a customer, no relationship with them, or anything.. the roadmap however was just a bunch of tickets, no deadline.. so its in their radar, probably based on your and probably many others opinions.. I'm however not really willing to renew it too (unless they give me a good discount) and would like to explorer other solutions.

oldstrangers · on Feb 1, 2023

Its pretty good. I've been using Amazon's Polly which so far to me has been the most realistic (https://aws.amazon.com/polly/). I feel like Polly still has an edge with variety of voices.

Zetobal · on Feb 1, 2023

Azure is so far ahead on neural voices it's not even funny.

https://azure.microsoft.com/en-us/products/cognitive-service...

Savaaki · on Feb 1, 2023

That one is pretty average, the stuff of elevenlabs is much better.

andrewstuart · on Feb 1, 2023

Still sounds clearly like TTS to me.

Zetobal · on Feb 1, 2023

[flagged]

satvikpendem · on Feb 1, 2023

Could you clarify then? I also thought it sounded like TTS and not as natural speech.

andreyk · on Feb 1, 2023

Related - I've found BeyondWords to be really nice. Its generated speech is not quite this good, but it's close, and it has a library of fairly different voices. Plus, it's UI allows you to create audio with a mix of voices, which is not offered by most other such services.

Plug warning - I've been using it to create narration for short stories with it for a while, and the output is better than I would have expected. Here's a recent example involving two characters talking - https://storiesby.ai/p/melancholy-musings-over-drinks

throwaway675309 · on Feb 1, 2023

I would disagree, they're almost too articulate so you get that very artificial clipped and stilted speech from those voices that you used in your story. It's especially apparent in the female voice.

Savaaki · on Feb 1, 2023

Have you heard their demo reading the great gatsby? Best TTS I've ever heard by a margin ...

https://www.youtube.com/watch?v=qRPTwPuZLjk

grouphoot · on Feb 1, 2023

Slow your roll Eleven. Sounds like you have beef with everything I feed you.

sublinear · on Feb 1, 2023

The default "Adam" voice sounds life like, but I wouldn't call him "conversational/clear". He sounds too forceful and dramatic like he belongs in a cartoon.

exodust · on Feb 1, 2023

With real voice actors, we can direct them to say their lines with more sadness. Or guarded desperation and struggle, on the verge of crying but clinging to hope... etc. This kind of subtle direction is not possible with artificial speech.

For narration it can work. But for dramatic character acting in animated films, the results make the characters sound like terrible actors. More granular control is needed over specific words, syllables, tone, emphasis and timing.

jjkmk · on Feb 1, 2023

Gave it a test, and wow it's very impressive. A lot you can do with the free version also, hopefully this takes off.

daggersandscars · on Feb 1, 2023

Is there an open source or perpetual license way of “cloning” ones voice?

This would be a boon to those who have lost or will lose the ability to speak or speak well. Especially if it can be integrated into communication apps and ones cell phone.

The number of people who could use this is going up as the hpv+ head and neck cancer wave ramps up.

fxtentacle · on Feb 1, 2023

In case anyone knows, what's the defensible moat here?

I can get almost the same quality using open source models. Plus I can fine-tune them to get custom voices. That means any company who needs TTS is cheaper off paying me once to build them a customized open source solution instead of forever paying this company per minute.

Simon321 · on Feb 1, 2023

Hmmm i don't know of any open source project that can get similar quality? Can you name one? This one also allows fine tuning for custom voices on a minute of audio and it works great.

Vt71fcAqt7 · on Feb 1, 2023

TorToiSe[0] is pretty good but I agree 11 is currently state of the art. Won't be long until GP is correct though. 1.5 years at best is my guess. The next moat will be multiple languages and maybe something like more control over the tone which is something perhaps more suited to a product.

[0]https://github.com/neonbjb/tortoise-tts

dang · on Feb 1, 2023

Recent and related:

This Voice Doesn't Exist – Generative Voice AI - https://news.ycombinator.com/item?id=34361651 - Jan 2023 (260 comments)

pigtailgirl · on Feb 1, 2023

-- think google Neural2 sounds better - https://cloud.google.com/text-to-speech/docs/voices --

pupppet · on Feb 1, 2023

I want to hear this voice applied to ChatGPT output.

ilaksh · on Feb 1, 2023

I'm using it for aidev.codes (which uses OpenAI's new models similar to ChatGPT) in the new dialog stuff I am developing such as for interviewing clients for requirements. The issue right now is that even though they have a streaming endpoint, the latency is all over the place and often not really adequate for something that is supposed to be conversational. But when it's working well it's just about fast enough. I probably should ask them if there is a trick. Right now I am sending multiple sentences at a time and then playing them one after another when the audio element emits the ended event

ddmma · on Feb 1, 2023

Coming soon in this repo https://github.com/dasdata/gortanagtp

gotrythis · on Feb 1, 2023

When this is an app I can use like Siri or Okay Google with an ElevenLab or equivalent voice, I will subscribe. Looking forward to this!

ShivShankaran · on Feb 1, 2023

Why is it called gtp and not gpt?

ddmma · on Feb 1, 2023

misspelling, thanks for point out

shever73 · on Feb 1, 2023

I did this last week. I asked ChatGPT to write a short article, and then used Eleven’s “Josh” voice to read it.

I had to adjust the punctuation to get it to sound more natural, but it was surprisingly good. Way better than the AI voices I hear on YouTube videos.

jareklupinski · on Feb 1, 2023

with another bot describing each scene and inputting that into stable diffusion

adql · on Feb 1, 2023

Package it as visual novel game and sell it

satvikpendem · on Feb 1, 2023

Is there an open source version we can use?

sublinear · on Feb 1, 2023

Found this project the other day that is probably good enough quality for most uses

i.e. sounds very natural and not like a robot from the 1980s and doesn't require a cloud service and can run on modest hardware

https://github.com/rhasspy/larynx

Demo: https://m.youtube.com/watch?v=hBmhDf8cl0k

nshm · on Feb 1, 2023

There is new better one https://github.com/rhasspy/larynx2

satvikpendem · on Feb 1, 2023

I've tried that before, it's okay but not to Eleven Labs' level.

andrewstuart · on Feb 1, 2023

Didn’t work from my iPhone

yarnover · on Feb 1, 2023

I got it to work with Safari on iOS 15.7.2, iPhone 7+. Firefox was a no go, I tried it first.

_boffin_ · on Feb 1, 2023