[HN Gopher] Show HN: Clone your voice and speak a foreign language
       ___________________________________________________________________
        
       Show HN: Clone your voice and speak a foreign language
        
       Author : _josh_meyer_
       Score  : 115 points
       Date   : 2022-01-03 20:17 UTC (2 hours ago)
        
 (HTM) web link (coqui.ai)
 (TXT) w3m dump (coqui.ai)
        
       | alonmln wrote:
       | Cool, it's impressive how much can it do with a short sample,
       | although this seems like an easy way for end users to deep fake
       | their friends / enemies saying something.
        
         | tiborsaas wrote:
         | I tested it with your comment: https://sndup.net/mghy/ :)
         | 
         | It's also a new possibility to somewhat personalize the text to
         | speech engines. The above example is not really close to my
         | voice.
        
         | Philip-J-Fry wrote:
         | Maybe the solution is to have a randomly generated paragraph of
         | text to read which expires in short amount of time. So you
         | can't predict it and you don't have enough time to splice
         | together a fake reading from something else.
        
         | kdavis wrote:
         | Currently we're looking at possible solutions, see for example
         | here[1]. If you have suggestions, feel free to chime in!
         | 
         | In the demo we specifically disallowed bulk uploads to hinder
         | such abuses.
         | 
         | [1] https://github.com/coqui-ai/TTS/discussions/1036
        
       | acqbu wrote:
       | Gold!
        
       | jeroenhd wrote:
       | Interesting. I like the addition of music to make sure it's not
       | just a raw voice sample. The output I get seems to be a mix of a
       | native speaker and my voice, because my (thick) accent is being
       | filtered out.
       | 
       | I suppose that if I ever take proper English pronunciation
       | classes, I now know what to strive for.
        
       | wombatmobile wrote:
       | Awesome!
       | 
       | How do I embed this?
        
       | bagels wrote:
       | Is there a static demo that I don't have to provide my own voice
       | for?
        
         | [deleted]
        
         | kdavis wrote:
         | We did not provide such a demo in part to hinder nefarious uses
         | of the technology.
        
           | crumpled wrote:
           | Honestly, how much of a hinderance is that? A person could
           | just supply a recording of another person, couldn't they?
        
         | reubenmorais wrote:
         | The project page has a bunch of pre-rendered samples and ground
         | truths: https://edresson.github.io/YourTTS/
        
       | pcarolan wrote:
       | This is incredibly impressive and does a great job of capturing
       | my voice. Well done!
        
       | akeck wrote:
       | Is it supposed to translate or just read with the target accent?
       | For me, it's only reading the English input text with the target
       | accent.
        
         | reubenmorais wrote:
         | It doesn't translate the text, you have to put in text in the
         | target language. But you can record audio speaking in any
         | language you want.
        
       | [deleted]
        
       | sxv wrote:
       | My 26 second training input perhaps wasn't enough. The result
       | sounded like someone else. Is the result some kind of merger of
       | my voice and a native speaker's?
        
         | reubenmorais wrote:
         | Similarity depends on many factors: recording quality, which
         | language you're synthesizing in (models trained on more
         | speakers do better), and diversity of prosody in your
         | recording. Try recording for a bit longer and "acting out" a
         | bit in your tone, that tends to give me interesting results :)
        
       | IanCal wrote:
       | Very interesting! Is the music an intentional blended track or an
       | artifact of generation?
        
         | _josh_meyer_ wrote:
         | very much intentional.
         | 
         | Background music makes misuse/abuse less likely (both
         | intentional and unintentional)
         | 
         | Read more here about in our open discussion:
         | https://github.com/coqui-ai/TTS/discussions/1036
        
       | momolo wrote:
       | is the model available?
        
         | _josh_meyer_ wrote:
         | Demo: https://coqui.ai Code: https://github.com/coqui-ai/tts
         | Blogpost: https://coqui.ai/blog/tts/yourtts-zero-shot-text-
         | synthesis-l... Paper: https://arxiv.org/abs/2112.02418
        
           | echelon wrote:
           | This is so cool! Thank you!
           | 
           | How do y'all intend to profit (succeed as a startup) if
           | you're releasing so much publicly? I'd love to see you guys
           | succeed.
           | 
           | Really great to see where some of the Mozilla TTS folks wound
           | up, too.
        
       | SwiftyBug wrote:
       | I speak Brazilian Portuguese natively. I chose to record my voice
       | saying a specific sentence and to "translate" it to Brazilian
       | Portuguese using the exact same sentence. I was very pleased to
       | find out that I became a Mineiro from the countryside, one of the
       | coolest accents in Brazil!
        
         | actually_a_dog wrote:
         | You spoke Portuguese into it and it just changed your accent?
         | That's kinda cool.
        
         | reubenmorais wrote:
         | The Brazilian Portuguese model is a bit of an extreme showcase
         | (and thus really cool!), as it was trained on a single speaker
         | (entirely recorded by the main author of the paper, Edresson
         | Casanova, who's Brazilian).
         | 
         | The fact that it can do multi-lingual voice cloning at all in
         | that case is already surprising. You can find more details in
         | the project page [0] and paper [1]. And here's the corpus. [2]
         | 
         | [0] https://edresson.github.io/YourTTS/
         | 
         | [1] https://arxiv.org/abs/2112.02418
         | 
         | [2] https://edresson.github.io/TTS-Portuguese-Corpus/
        
       | winter_squirrel wrote:
        
       | ceva wrote:
       | it says enter your text here ..
        
         | kdavis wrote:
         | You're free to enter any input sentence you want in the text
         | box.
         | 
         | The input sentence generally should be in the language you
         | selected from the dropdown. For example, if the dropdown has
         | "French" selected you could enter the text "Allons enfants de
         | la Patrie, Le jour de gloire est arrive!"
         | 
         | Clicking "Submit" then generates a TTS reading of the sentence
         | you input in the language selected from the dropdown.
         | 
         | For fun you can mix and match. In other words, select a
         | language from the drop down and enter text in the text box
         | _not_ in the language selected from the dropdown. (For example,
         | the dropdown could have  "French" selected and the sentence
         | could be "O say can you see, by the dawn's early light". This
         | gives interesting results, it sounds as if a native French
         | speaker is speaking English.)
        
       ___________________________________________________________________
       (page generated 2022-01-03 23:00 UTC)