[HN Gopher] The Seamless Communication models
       ___________________________________________________________________
        
       The Seamless Communication models
        
       Author : skadamat
       Score  : 537 points
       Date   : 2023-12-01 14:53 UTC (8 hours ago)
        
 (HTM) web link (ai.meta.com)
 (TXT) w3m dump (ai.meta.com)
        
       | infotainment wrote:
       | It's amazing how far text to speech has come in the past few
       | years, but what I'm wondering is when this tech will finally make
       | it into local TTS engines baked into the OS (eg for screen
       | readers, etc)
        
         | PartiallyTyped wrote:
         | The accessibility nerd in me is excited!
        
         | callalex wrote:
         | This is already built into recent iOS devices and it's called
         | Live Captions.
        
           | freedomben wrote:
           | Same with Android (Pixel phones at least).
           | 
           | I'm the most excited for an open source one though, and it
           | would be incredible if this could become it. I do 95% of my
           | compute on desktop linux and it sucks being behind.
        
       | coffeebeqn wrote:
       | We can't be that far off from almost perfect real-time
       | translation. There is some latency of course to hear and process
        
         | mrob wrote:
         | Differences in verb-subject-object word order will always add
         | latency. If you want to translate from German, with the verb at
         | the end, to Welsh, where the verb goes at the start, you'll
         | have to wait for the complete sentence before you can begin.
        
           | tralarpa wrote:
           | It's very impressive what simultaneous interpreters can do.
           | They don't wait for the end of the sentence.
        
             | numpad0 wrote:
             | Yeah they backtrack on branch prediction failures.
        
               | dylan604 wrote:
               | What kind of heartbleed that must introduce.
        
               | Vecr wrote:
               | You mean meltdown/Spectre?
        
               | dylan604 wrote:
               | probably, but you got the gist anyways
        
             | MrsPeaches wrote:
             | Even they struggle with jokes though.
             | 
             | This may be apocryphal but I've heard that in formal
             | settings (e.g. UN) they won't translate it and will instead
             | give instruction on when to laugh.
        
           | d3m0t3p wrote:
           | Not necessarily true, for the first few sentences you won't
           | be able to do it. But afterwards, once the context is
           | established you don't really need to wait for the verb, you
           | can predict it. For example if you are speaking about
           | cleaning the house and you detail that you have cleaned the
           | kitchen the stove and so on, you can predict the verb with
           | only the start of the sentence. I don't have any source to
           | back this up, but it sounds plausible
        
             | gberger wrote:
             | What if the predicted verb was incorrect, but the model has
             | already translated the incorrect prediction? How does it
             | tell you about a mistake?
        
               | mrandish wrote:
               | A good approach might be to start with how top notch,
               | ultra-experienced human translators handle corrections
               | for real-time scenarios, for example, the expert
               | translators that do the ear monitors at the United
               | Nations. I've worked with a few such real-time
               | translators when preparing keynote speeches and they seem
               | to have rigorous processes that appeared quite deep.
               | Probably a ton of domain expertise to be captured there.
               | 
               | That said, I suspect that real-time language translation
               | is always going to be somewhat imperfect due to its
               | nature. Non-real-time translation of literature is still
               | a subjective art form even at the very high-end of human
               | expertise.
        
             | shkkmo wrote:
             | Once you start predicting what someone is going to say you
             | are no longer translating their speech
        
             | Teever wrote:
             | Yeah but then you're just introducing branch mispredictions
             | which will cause latency and potential confusion down the
             | line.
             | 
             | It's all a trade off.
             | 
             | Either way it's extremely exciting that we get to even
             | discuss this stuff as real possiblities.
        
         | Innervisio wrote:
         | Although true and considering what "mrob" had also replied,
         | this will never mean full translation every time, all the time.
         | This will work with specific environments and linguistic
         | expectations.
         | 
         | I've been learning german since 8 years, and the amount of
         | expressions and different ways to say things around the country
         | is impressive. There'll be a "interpretative" real-time
         | translation, but it won't guarantee fully understanding in so
         | many cases, maybe ever.
         | 
         | Other thing, and we have this in common with all languages, is
         | the context and this is difficult to address i believe.
         | 
         | Nevertheless, it's impressive how far we've reached and i
         | acknowledge the usability of these tools. However, human
         | knowledge will be always crucial and primordial if we want to
         | guarantee full understanding.
        
           | InCityDreams wrote:
           | >I've been learning german since 8 years,
           | 
           | "Since", as used here, would lead me to guess you are not a
           | native English speaker?
        
       | WhatsName wrote:
       | Did anyone compare this to nllb (also meta) yet?
        
         | trovas wrote:
         | in the paper, the results reported show very similar level of
         | quality
        
         | jkw wrote:
         | We're the same team! We have some comparisons in the paper.
        
       | ukuina wrote:
       | Next step is combining the output with few-sample speech
       | synthesis so the output is in the original speaker's voice!
        
         | modeless wrote:
         | This does that already. At least, to a first approximation.
         | Voice cloning is not that great in general right now.
        
           | blovescoffee wrote:
           | The voice cloning worked pretty well for me. From english to
           | spanish I noticed that the first few words sounded more like
           | me than the last few words. Also it doesn't sound like how I
           | speak in spanish but that's expected.
        
           | coffeebeqn wrote:
           | Voice cloning works pretty well already but not necessarily
           | on one 10 sec sample as the source data. If you can give it
           | some hours of data it'll work much better
        
             | modeless wrote:
             | Do you have examples of it working well? I haven't heard
             | anything that really impressed me. Nothing close to a good
             | human impersonator. We're a long, long way from replacing
             | voice actors, even considering the rapid rate of progress.
        
       | kaycebasques wrote:
       | Besides the obvious good news about making it easier for people
       | to communicate with each other across languages, it's also
       | exciting to me that we're trending towards a world where I can
       | tap into all the knowledge that only exists on the non-English
       | web. I'm sure there are vast troves of programming knowledge in
       | the Japanese-only web for example. The Chinese-only and Russian-
       | only web are obvious candidates too but presumably those are
       | harder to access for other reasons.
        
       | nickreese wrote:
       | My wife was training to be a professional voice actor to do
       | dubbing in several languages when we met.
       | 
       | I told her then that the industry would be disrupted by AI before
       | she retired.
       | 
       | Glad she pivoted. Really impressive results.
        
         | 0_____0 wrote:
         | It won't replace high-end talent, I don't think models can
         | replicate the nuance for a long time, however the entire low-
         | to-mid end of the market is going to get nuked from low earth
         | orbit
        
           | Shish2k wrote:
           | I wonder which will happen first - AI evolves to work well at
           | the high-end, or high-end humans retire and there's nobody
           | left in the low-to-mid end to fill their shoes...
        
             | callalex wrote:
             | Given the modern trend of on-screen actors doing voice
             | work, I think there will be a supply of talent for at least
             | a few more generations.
        
           | crakenzak wrote:
           | It will absolutely replace high-end talent. Anything that a
           | human can do will be able to be done 10x better by a model --
           | especially in such a narrow and well defined domain.
        
             | sushisource wrote:
             | Did you hear the output examples? Yeah, I think not. I
             | mean, definitely on the way, but there's no way if you need
             | quality acting in your dub that you're going with this.
        
               | ygjb wrote:
               | These are models specially tuned and sized for near real-
               | time, instant translation. It would be naive to think
               | that there aren't technical creatives building and
               | training models tuned for expressiveness and nuance in a
               | more controlled environment.
        
               | crakenzak wrote:
               | Maybe not in the current state of the model, but judging
               | by the rate of improvement we're all seeing it's just a
               | matter of time (and data+compute+research obv).
        
               | dvngnt_ wrote:
               | i think the key word is will.
               | 
               | a few more years of improvements if they happen could be
               | disruptive
        
               | dontupvoteme wrote:
               | That's what they gave us plebs. To think they don't have
               | a superior one they can sell...
        
           | chrismorgan wrote:
           | It won't _replace_ it, but it's very likely to _supplant_ it,
           | just about destroying the segment by reducing demand by being
           | _good enough_ and so much cheaper, especially as people get
           | more used to it.
           | 
           | Typesetting. Music engraving. Bookbinding. The quality of all
           | these fields have been materially harmed by advancements.
           | 
           | Computer typesetting has, by and large, been a significant
           | regression, though the gap has largely been made up now if
           | you make the right choices.
           | 
           | Published music scores used to be set by experts. Now they're
           | set by novices using software that is mechanical in method
           | and generally quite insipid. Most are _atrocious_ compared to
           | the old masters, and mediocre at best compared to the typical
           | published scores from a hundred years ago; and very few
           | popular scores are really good (... and if they are, there's
           | a reasonably high chance they've used GNU LilyPond, which has
           | focused on this problem). But the barrier for entry is _so
           | much lower_ , and people have got used to the inferior
           | results, so I don't know if _anyone_ engraves music the old
           | way, and even people that know better largely just shrug and
           | make do with the new. Like with computer typesetting, there
           | is hope because things _have_ slowly improved. But most will
           | continue to be mediocre.
           | 
           | Books used to be bound with cold glue. It takes time to set,
           | but the results are very good, supple and long-lasting. Then
           | along came hot-melt glue, and it's just _so_ much friendlier
           | for cheap manufacturing because books are finished within a
           | few minutes instead of a day or two, that I don't think
           | _anyone_ produces books the old way any more, even though the
           | results are _abysmal_ in comparison (compare the binding _and
           | reading experience_ of a paperback from the '40s or '50s with
           | one from the turn of the century; no one after tasting the
           | old will desire the new; for he says, the old is good). But
           | they're just (barely) good enough. Unlike the other two, I
           | don't think there's any hope here--the regressive advancement
           | crowded out the superior but dearer option so that no place
           | was found for it.
        
             | pclmulqdq wrote:
             | You can still get relatively good published music scores
             | from a few of the old German shops (Schirmer, Henle, etc.),
             | but they are very expensive. They are a joy to use when
             | playing, though, since the music is very clearly laid out
             | and page turns are in the perfect place, etc. Finale and
             | Sibelius are controllable enough that you can use them to
             | do fantastic layout, but many people either do not
             | understand how to make a score readable or don't care
             | enough.
        
               | TeMPOraL wrote:
               | That, and what GP describes, is what I see as the overall
               | trend of the market to hollow out the middle. It's not
               | just about technology (though it plays a big role), as
               | all optimization coming from competitive pressure -
               | materials, processes, business models, marketing.
               | 
               | What seems to universally happen is, the market
               | bifurcates - one part is in a race to the bottom, the
               | other (much smaller) aims for super premium tier
               | (overpriced quality), because only those two positions
               | are sustainable, once the race-to-the-bottom side drags
               | all the economies of scale with it. So as a consumer, you
               | get to chose between cheap low-quality garbage that's
               | barely fit for purpose, and rare, super-expensive,
               | professional/elite high-end products. There is no option
               | for "good value for reasonable price".
               | 
               | This has been happening to everything - software,
               | furniture, construction, electronics, vehicles, _food_ ,
               | you name it.
        
           | RowanH wrote:
           | I'm using AI for training videos for my startup. Never going
           | back to voice actors outside of primary marketing videos.
           | tThe sheer convenience of write/listen/tweak cycle on scripts
           | is insane. In minutes you can do a voiceover which would have
           | have taken hours + days delay prior.
           | 
           | Sure the final result sounds slighty robotic. 99% of people
           | wouldn't care, and you can get more training videos done,
           | faster for a fraction of the cost.
           | 
           | [Edit] And I'll add the difference from 6 months ago is
           | noticeable to today. I imagine every 6 months we can just re-
           | download updated voiceovers and every 6 months will sound
           | just slightly more polished..
        
         | ggregoire wrote:
         | > I told her then that the industry would be disrupted by AI
         | before she retired.
         | 
         | Yes. I just discovered there is a text-to-speech addon [1] (now
         | a few months old) for World of Warcraft that adds voices for
         | every NPC in the game... It is so impressive and game changer
         | (pun intended) that I naively asked in the chat of the Twitch
         | stream I was watching "when did Blizzard add voices to the
         | NPCs??". For an instant I really thought Blizzard contracted
         | actors, but no, someone like you and me just used AI to
         | generate realistic voices for every character in the game. I
         | don't think it's ready yet to completely replace actors in
         | video games (surely it will in the near future tho) but voice
         | acting is something so expensive to do that I can see studios
         | and developers in 2024 already use this tech for all the
         | optional dialogues and secondary characters' voices.
         | 
         | [1] https://www.curseforge.com/wow/addons/voiceover
        
           | lyu07282 wrote:
           | Another recent example, the finals uses AI voice generation
           | for realtime game announcements
           | 
           | https://youtu.be/kZ87wiHps9s
        
           | freedomben wrote:
           | I've wondered at what point this would happen. I think it
           | could now, but from what I've read the voice actor unions are
           | able to prevent it currently (at least for AAA games or non-
           | indie devs). Many of them have agreements/contracts in place
           | for the foreseeable future, and being the first big company
           | to replace them is a heap of terrible press that nobody is
           | going to want to touch. I think it's the same reason
           | Hollywood reached the AI agreement recently too.
        
         | Halong wrote:
         | My wife is paying our mortgage teaching English on Preply. I'm
         | extrememely worried about where we'll be in 10 years.
        
         | ilaksh wrote:
         | What did she pivot to? I don't think any currently existing job
         | is really safe in the medium-to-long term.
        
       | Jayakumark wrote:
       | How does this compare to whisper-large-v3 on STT?
        
         | trovas wrote:
         | I work on seamless. You can see the results in the paper. M4Tv2
         | is significantly ahead (Whisper Large v3 - 16.9 BLEU vs. M4Tv2
         | 26.6). These are averages over 81 directions X->english
        
       | 999900000999 wrote:
       | Can't wait for someone to roll a language tutor out with this
       | tech.
       | 
       | Everyone gets a personal tutor for hours a day.
       | 
       | I would absolutely love a VR game where I just need to work in
       | China or Mexico all day and pick up the language that way.
        
         | modeless wrote:
         | This is what I'd like to build (the tutor part at least, not
         | the VR game part yet). I'm planning to extend my current
         | English only rough prototype[1] to support Mandarin. (I happen
         | to be learning Mandarin myself at the moment, and there are a
         | bunch of open source bilingual Mandarin LLMs and speech
         | synthesizers from China to choose from.)
         | 
         | I think a lot of people are working on similar things right
         | now. I know of one called http://yourteacher.ai
         | 
         | [1] https://apps.microsoft.com/detail/9NC624PBFGB7
        
           | siraben wrote:
           | Is there a high quality speech synthesizer (ideally local)
           | for Mandarin you have found? There are some subtleties with
           | tone sandhi rules and how they interact with prosody that I
           | feel are lacking with current TTS voices I've tried.
        
             | modeless wrote:
             | The first one I plan to try is https://github.com/netease-
             | youdao/EmotiVoice
             | 
             | I don't have the expertise to judge the quality of Mandarin
             | pronunciation myself, being a beginner. But it sounds OK in
             | English and it's made by native Mandarin speakers in China
             | so I expect that it sounds better in Mandarin than English.
        
               | siraben wrote:
               | Sounds pretty good, although still lacking in natural-
               | sounding tone sandhi (e.g. try Yi Xia , it should be
               | yi2xia4 instead of yi1xia4).
        
             | gattr wrote:
             | I love the idea of LLMs being super-efficient language
             | tutors. And you have a good point; coming soon: "We've been
             | getting a lot of these tourists here lately, they're eerily
             | fluent, but all seem to have the same minor speech
             | impediment" (read: messed-up weights in a commonly used
             | speech model).
        
               | siraben wrote:
               | I've been using ChatGPT 4 to translate and explain
               | various texts in Mandarin and it's been very on point
               | (checking with native speakers from time to time, or
               | internet searches). As expected, it has trouble with
               | slang and cross-language loanwords from time to time.
               | However for languages with much lower information online,
               | it hallucinates like crazy.
               | 
               | > coming soon: "We've been getting a lot of these
               | tourists here lately, they're eerily fluent, but all seem
               | to have the same minor speech impediment"
               | 
               | Haha, if that were to pass, that would still be a far
               | better outcome than our current situation of completely
               | blind machine translation (this is especially for various
               | Asian languages that are very sensitive to phrasing) and
               | mispronunciation by non-native speakers.
        
               | bityard wrote:
               | > all seem to have the same minor speech impediment
               | 
               | Ah, that is called an accent.
        
               | dontupvoteme wrote:
               | Kind of, Accents are typically derived from the
               | intersection of natural languages, specifically which
               | ones you learned the phonetics of first. (With the
               | exception of the Mid-Atlantic accent...)
               | 
               | This would be something quite novel as the speech
               | irregularities would not have their origin in people
               | 
               | I don't know what you would call it but it needs at least
               | some adjective before accent to differentiate it IMO
        
             | rnjesus wrote:
             | the azure neural tts voices in chinese are the best i've
             | heard, specifically the "xiaochen" voice. i use it in anki
             | daily to generate sentences for my mandarin decks with an
             | api key/plugin. it's not something you run locally of
             | course, but they have a decent enough free tier.
             | 
             | i'm hoping a voice as realistic as this becomes a local app
             | soon, but i've not found anything that's nearly as natural
             | sounding yet. (also, honorable mention to chatgpt's "sky."
             | she pronounces mandarin with a funnily american accent, but
             | it sounds natural and not as robotic as the open-source
             | alternatives i've tried)
        
         | meowtimemania wrote:
         | There's already a few of them. Checkout https://hallo.ai
        
           | 999900000999 wrote:
           | I wouldn't feel good about anything that's not focused on a
           | single language.
           | 
           | You end up with the Duolingo problem where you know to say
           | the names of 20 different fruits but not how to introduce
           | yourself.
        
             | apwell23 wrote:
             | > You end up with the Duolingo problem where you know to
             | say the names of 20 different fruits but not how to
             | introduce yourself.
             | 
             | Not sure if this is a duolingo problem. There are of
             | modules in duolingo specifically for saying your name. I
             | think its the travel module.
        
             | coldtea wrote:
             | Never seen that in Duolingo. It starts with the basics and
             | phrases, not random useless vocabulary.
        
               | cptskippy wrote:
               | I was going to Italy and started using Duolingo to try
               | and help. I learned such useful phrases as "the children
               | have bread".
        
             | gs17 wrote:
             | Duo has a different problem for me. The lack of focus means
             | some languages don't get features. Chinese still doesn't
             | have Stories (there's an unofficial version of it, but
             | we've been waiting _years_ ).
        
             | numpad0 wrote:
             | (Duolingo problem(, AIUI): Duolingo is designed around such
             | premise that, by exposing your subconsciousness to such
             | small set of words and phrases in target languages, your
             | brain should be able to trivially construct output shims
             | from Universal Grammar, which must exist, to desired
             | languages; but that doesn't work in practice and you end up
             | with small set of words and phrases your subconsciousness
             | had recorded)
        
             | massimokris wrote:
             | the Duolingo's problem it is not because they have a bunch
             | of languages, it is because achieving fluency in a target
             | language it is about been able to produce/generate phrases,
             | and they just move you to consume and sort words and
             | phrases. in the case of any AI Language tutor, the student
             | must produce phrases in order to practice, and that makes
             | them advance in the path to achieving fluency
        
         | jahewson wrote:
         | Isn't having the AI do it for you better than having the AI
         | teach humans to do it?
        
           | dylan604 wrote:
           | Sure, if you're not into personal growth. Not everyone wants
           | to become the useless bit of lard sitting in a chair while a
           | computer does everything for them. Yet. Some of us still like
           | to do the actual things, but just need some assistance along
           | the way. We still have a bit of time before we're all the
           | humanoids from Wall-E
        
             | ericmcer wrote:
             | Yeah thats why I mill my own grain and am getting into
             | textiles.
        
               | djvdq wrote:
               | I love when people use this pathetic extreme examples,
               | when they don't have any meaningul arguments.
        
               | ericmcer wrote:
               | That isn't an extreme example at all, people used to mill
               | grain and make clothing by hand, now we don't. We somehow
               | are not sitting around getting fat even though technology
               | takes care of those tasks.
               | 
               | The parents suggestion is that if we don't have to learn
               | languages that will lead to us all laying down drinking
               | big gulps while robot slaves take care of us. Their take
               | is the extreme example. People have literally made this
               | same suggestion about every technological advance and it
               | never comes true.
        
             | TeMPOraL wrote:
             | > _We still have a bit of time before we 're all the
             | humanoids from Wall-E_
             | 
             | Obligatory reminder that the movie itself explains that
             | people are what they are _not_ because of their lifestyle,
             | but because of the time spent in low-gravity environment.
        
               | dylan604 wrote:
               | not sure that really matters to the point
        
           | modeless wrote:
           | Even a perfect human translator following you around wouldn't
           | be anywhere near as good as knowing the language yourself.
        
           | whoisburbansky wrote:
           | It depends on what your goal is; for some tasks it's possible
           | that getting the AI to do it is best, but, e.g. the existence
           | of auto-pilot doesn't mean that hobbyist pilots wouldn't
           | benefit from/enjoy exercising the same skills manually.
        
           | swatcoder wrote:
           | _Maybe_ prior to fluency, for something like an odd business
           | or tourist trip.
           | 
           | But there's a point in language learning where you can come
           | to express yourself directly in a new language without
           | intermediary "thinking" in your first tongue. The
           | communicative and expressive potential of that mode is much
           | higher than trying to squeeze one's intent through any kind
           | of translation, machine or internal.
           | 
           | Plus, you know, it's fun.
        
           | j33zusjuice wrote:
           | Not necessarily. It depends on the use case. For taking a
           | vacation, having an AI that can instantly translate to your
           | native language would be amazing. That'd solve a lot of real
           | world problems, no doubt.
           | 
           | However, translation has a great deal of subjectivity
           | embedded in it, particularly when there aren't 1:1
           | translations. Case-in-point: there are many English
           | translations of the Christian bible, all similar enough, but
           | there are enormous variations in some cases. And there are at
           | least as many branches of Christianity as there are English
           | translations of the Bible. Some of them strictly recommend
           | the same translation, and they still disagree on the meaning
           | of various passages.
           | 
           | Besides the problems inherent to translation, learning
           | another language gives you another paradigm of thinking. The
           | words we use, the way we construct sentences, etc., all
           | impact our view of the world. Here's a paper that discusses
           | the impact of the over-reliance on English in cognitive
           | sciences, and how this has downstream effects: https://www.sc
           | iencedirect.com/science/article/pii/S136466132...
           | 
           | Learning languages as an adult also has protective benefits.
           | It reduces the probability of Alzheimer's (maybe dementia,
           | overall?).
        
           | coldtea wrote:
           | In the way that watching porn is better than having sex.
        
         | advaith08 wrote:
         | seen a lot of these, but none for Indian languages. Would love
         | to try an Indian language one!
        
           | 999900000999 wrote:
           | Are Indian languages hard for English speakers?
        
             | thinkingtoilet wrote:
             | I'm learning Hindi and there are somethings that are easy
             | (phonetic alphabet, nothing like 7 different sounds for
             | 'ough') but the sentence structure is very different and
             | can be hard to get right. Pronunciation isn't too bad for
             | the most part but there a few tricky things, for example
             | four different 't' sounds and four different 'd' sounds.
             | The hardest part is that there really aren't that many
             | resources. Even though Hindi is the third most spoken
             | language in the world, you will find far more resources for
             | many of the less spoken European languages.
        
         | tmountain wrote:
         | Started a project to do this a while back. It's pretty fleshed
         | out:
         | 
         | https://www.parcero.ai/
         | 
         | I could integrate this instead of Polly pretty easily.
        
         | bilsbie wrote:
         | I think it would be so ironic if advanced AI ended up simply
         | teaching us new languages quickly instead of translating for
         | us.
        
           | toomuchtodo wrote:
           | Might be able to generate a better language than what we
           | have.
        
             | bilsbie wrote:
             | Good point. Maybe they invent a better language and easily
             | teach it to everyone.
        
           | dontupvoteme wrote:
           | Finally Esperanto has a use case!
        
         | spaceywilly wrote:
         | To me the key functionally for any language learning app is
         | giving you feedback on your pronunciation and general
         | understanding. I've been using Duolingo to learn Mandarin and
         | when I try to speak to anyone it's difficult for them to
         | understand me, because my pronunciation is all wrong. The app
         | is just feeding info to me one way, and I can try my best to
         | recreate what I'm hearing, but there's no way to know if I'm
         | messing it up. They do have a speaking feature but it doesn't
         | work very well, certainly not to the same level as speaking
         | with a real person who is fluent in the language and having
         | them correct you.
        
           | throwaway4aday wrote:
           | As a quick solution, you should try recording yourself
           | speaking and then listen to it to check your pronunciation
           | against some reference. So for example, find a YouTube video
           | in the language you're learning that also has good subtitles
           | (use https://filmot.com/ ) and listen to how they say the
           | phrase and then record yourself saying the same phrase and
           | play it back and compare.
        
           | dog321 wrote:
           | I practiced for a long time using the below pronunciation
           | trainer and I get a ton of compliments from native speakers
           | on how accurate my pronunciation is.
           | 
           | https://fluent-forever.com/product/fluent-forever-
           | pronunciat...
        
         | inbread wrote:
         | I built just this a month ago with the Azure AI speech API,
         | which is already pretty good at multilingual speech.
         | 
         | https://github.com/adrianmfi/gpt-tutor
         | 
         | I look forward to testing if switching to Seamless can improve
         | it further, Seamless supporting nearly 100 languages is a nice
         | improvement.
        
         | jbird11 wrote:
         | Absolutely, what I've noticed is that the current apps are
         | great for beginners but after a certain point the only way to
         | improve your ability to speak a new language is to well...
         | speak it. I built Proseable to help people move beyond the
         | generic how to order a coffee or ask to go to the bathroom, and
         | have more meaningful conversations in the real world. Check it
         | out!
         | 
         | https://www.proseable.com/
        
         | Jeff_Brown wrote:
         | > game
         | 
         | Yes! Better yet, you're a spy, or a hostage negotiator, or the
         | leader of any kind of enterprise (army, business, aid
         | organization) ...
         | 
         | Programming games like that will resemble directing improv
         | theater. You can't program every response; you'll have to
         | instead fit each character with beliefs and motivations.
         | 
         | I can hardly wait.
        
         | dontupvoteme wrote:
         | For Language Acquisition, Input Is All You Need. (Mostly)
         | 
         | What would be really cool is something that can autodub videos
         | or audio into your target language. The hardest problem
         | learning languages that aren't English is often finding content
         | to consume in them.
         | 
         | Disclaimer : I am Krashenist so this take is biased
        
         | massimokris wrote:
         | I built one for people in Latam to practice languages in a
         | conversational way through a WhatsApp chat
         | https://wa.me/+5491162951713?text=hola%20Speakeasy
        
         | flanbiscuit wrote:
         | I would love a game that helped you learn a language (not
         | necessarily VR though as I don't have that equipment). The game
         | drops you into a world (a country of the language the game is
         | meant to teach you) where no one speaks your language and you
         | have to figure out what people are saying in order to fulfill
         | quests. You get some hints, like maybe you have a simple
         | translation guide in your inventory or sometimes you meet
         | people who can speak a few words of your language. That would
         | motivate me to learn faster than self-taught tutorials.
         | 
         | I'd love to learn French and the game would take place in
         | locations all around modern France.
         | 
         | It would have to a good story. Maybe something in the style of
         | Professor Layton series could be interesting, or something more
         | open world.
        
         | dwighttk wrote:
         | and the language tutor company could have you pilot around a
         | menial labor droid while you are learning...
        
         | zbyforgotp wrote:
         | But will people use them?
        
       | pnut wrote:
       | I was hoping to find out, that the actor's voice in the demo
       | video was generated, or that he had recorded the video speaking
       | in another language or something.
       | 
       | That would have been the knockout punch.
        
       | polygamous_bat wrote:
       | "The Babel fish is small, yellow, leech-like, and probably the
       | oddest thing in the Universe. It feeds on brainwave energy
       | received not from its own carrier, but from those around it. It
       | absorbs all unconscious mental frequencies from this brainwave
       | energy to nourish itself with. It then excretes into the mind of
       | its carrier a telepathic matrix formed by combining the conscious
       | thought frequencies with nerve signals picked up from the speech
       | centres of the brain which has supplied them. The practical
       | upshot of all this is that if you stick a Babel fish in your ear
       | you can instantly understand anything said to you in any form of
       | language. The speech patterns you actually hear decode the
       | brainwave matrix which has been fed into your mind by your Babel
       | fish.         "Now it is such a bizarrely improbable coincidence
       | that something so mind-bogglingly useful could have evolved
       | purely by chance that some thinkers have chosen to see it as a
       | final and clinching proof of the non-existence of God.
       | "The argument goes something like this: 'I refuse to prove that I
       | exist,' says God, 'for proof denies faith, and without faith, I
       | am nothing.' 'But, says Man, the Babel fish is a dead giveaway,
       | isn't it? It could not have evolved by chance. It proves you
       | exist, and, by your own arguments, you don't. QED.' 'Oh dear,'
       | says God, 'I hadn't thought of that,' and vanishes in a puff of
       | logic."
        
       | fassssst wrote:
       | Try the demo here, you record a video of yourself and it does
       | voice cloning and a comparison:
       | 
       | https://seamless.metademolab.com/expressive/?utm_source=meta...
        
         | ceejayoz wrote:
         | > This research demo is not open to residents of, or those
         | accessing the demo from, the States of Illinois or Texas.
         | 
         | Interesting mix.
        
           | solardev wrote:
           | Illinois has a facial recognition / cloud biometrics ban.
           | Familiar face detection for doorbells etc. isn't allowed
           | there. Wonder if Texas has something similar?
        
             | ceejayoz wrote:
             | Ah, that makes sense.
             | 
             | In Texas it seems to be part of AG Paxton's culture war
             | stuff. https://www.texastribune.org/2022/05/12/texas-face-
             | filters-i...
        
           | aschla wrote:
           | Likely related to biometrics laws. I know Illinois has
           | restrictions on the collection of biometrics, not sure about
           | Texas. Facebook in particular paid out a significant amount
           | of money in a class action in Illinois, I know because I got
           | a chunk of change from it.
        
             | dylan604 wrote:
             | which you mean someone took a dime and carved off a piece
             | of it, and then sent you a piece of paper with postage that
             | cost more than the value of that chunk? yeah, we all got
             | hosed by that one too i'd imagine
        
               | ceejayoz wrote:
               | https://www.nbcchicago.com/news/local/illinois-facebook-
               | user...
               | 
               | > According to the Settlement Administrator, payments to
               | class members between $200 to $400 started going in the
               | mail May 9.
               | 
               | I got a $0.19 check from an iTunes settlement once, but
               | this wasn't one of those cases.
        
           | jlund-molfese wrote:
           | It's because of https://www.ilga.gov/legislation/ilcs/ilcs3.a
           | sp?ActID=3004&C...
           | 
           | Facebook has had to pay out hundreds of millions of dollars
           | in settlements for related class-action lawsuits, and rather
           | than trying to get informed consent, they're deciding not to
           | collect biometrics from residents of those states.
        
         | SillyUsername wrote:
         | And that demo is now overloaded and fails to translate the
         | input :D
        
         | teacpde wrote:
         | As someone working in tech and following along the progression
         | of AI, I believe I have the right expectation. But still feels
         | surreal seeing myself speaking a foreign language in my own
         | speech style.
        
         | wedn3sday wrote:
         | Well that was spectacularly bad. Failed to translate a single
         | word from english->spanish. Admittedly I was using George
         | Carlins favorites, but if you're trying to have an expressive
         | language translator that refuses to translate "fuck" then what
         | you've got is bullshit.
        
       | StrangeDoctor wrote:
       | Any more info about the watermarking? Only Meta can make the
       | determination?
       | 
       | Edit: I can't find the weights but if I'm reading the paper right
       | anyone could train their own detector.
        
         | hadyelsahar wrote:
         | Hey! a RS from Meta seamless team here.
         | 
         | Yes, we chose not to release the watermark detector to
         | safeguard against adversarial attacks. This decision helps
         | prevent any attempts to erase the watermark by malicious users.
         | 
         | The watermark generator and detector are trained together, one
         | can use the information in our paper to train your own
         | generator and detector model, however in this case the
         | watermark signature created will be distinct from the one we
         | use to protect our seamless translation models. This approach
         | ensures each model maintains its unique security features.
        
           | StrangeDoctor wrote:
           | Thanks for clarifying, and seems like a completely reasonable
           | approach. Thanks for the great work.
        
       | gagabity wrote:
       | I had pretty terrible results when I tried English -> Swahili I'm
       | using the Huggingface M4T V2 spaces, it pretty much doesn't work
       | most of the time and I just get English back with a different
       | voice, Expressive on the other hand only has a few languages it
       | seems.
       | 
       | It would be nice if they could layout what exactly is missing in
       | terms of data to make a language work better, while the actual AI
       | bit is out of reach for most of us maybe we could provide more
       | data.
       | 
       | There is also a 60 sec limit and wonder if this is HuggingFace
       | limitation or Seamless?
        
         | yorwba wrote:
         | > maybe we could provide more data.
         | 
         | If you want to contribute by recording yourself speaking
         | Swahili, https://commonvoice.mozilla.org/sw is the place to go.
         | Although Meta has access to much larger data sets, they
         | nonetheless use Common Voice as a "known good" source. E.g. the
         | paper on their SONAR speech encoder reports experiments on
         | Common Voice data, coincidentally involving Swahili
         | https://ai.meta.com/research/publications/sonar-sentence-lev...
        
       | whbrown wrote:
       | Can anyone help demystify the licensing?
       | 
       | Besides the ACCEPTABLE_USE_POLICY, there's a CC BY-NC 4.0
       | (NonCommercial) license, a 'SEAMLESS_LICENSE' (NonCommercial),
       | but also an MIT license? It would seem these other licenses
       | contradict the MIT license, could somebody help clarify how these
       | all interact in practice?
        
         | dankle wrote:
         | MIT for the code, NonCommercial for the trained models I bet.
        
         | disattention wrote:
         | The license details are listed on the project GitHub
         | 
         | https://github.com/facebookresearch/seamless_communication#l...
        
       | jeffbee wrote:
       | How will Meta put these models into practice? I understand why
       | Google and Apple have models for their mobile OS users, but I
       | don't understand where users for Meta speech models come from.
       | Are they planning to show Instagram videos with English narration
       | in French or what?
        
         | solardev wrote:
         | Ads in any language!
        
         | polygamous_bat wrote:
         | Ads and Reels (their TikTok competitor) I imagine would be the
         | primary use-case. Imagine spreading the "wonders" of TikTok-
         | like videos to non-$native_language speaking world.
        
           | dylan604 wrote:
           | but isn't that a TikTok shtick to use the obviously fake
           | voice in your video?
        
         | crakenzak wrote:
         | They have arguably the most diverse userbase of any company,
         | with users from pretty much every single country + language
         | across all their services & apps. I could easily imagine a
         | handful of use cases having a high performing universal
         | translation model would be incredibly useful.
        
         | spacemanspiff01 wrote:
         | The metaverse will not have any language barriers...
        
       | beders wrote:
       | I'm thrilled to see the progress made in the last 30 years.
       | 
       | As a student in the mid-90s I worked on a system called Verbmobil
       | at the German Research Center for AI and it did speech-to-speech
       | for English, German and Japanese in very limited domain.
       | 
       | This was done via "classical" NLP: You had to model the domain
       | with concepts, you needed sentence parsers, semantic engines,
       | speech-to-text hand-crafted for 3 languages etc.
       | 
       | As it turns out, this approach is/was a dead-end.
        
       | kapp_in_life wrote:
       | Neat. How translatable are tones of voice for intent across
       | languages? Like does a person trying to do a "nerdy"
       | voice(nasally, whiny, etc.) in English translate to the "nerdy"
       | stereotype for a French speaker. Seems to do very good on
       | whispers which made me wonder what could be next.
        
         | jeffbee wrote:
         | If you don't speak the language into which these models
         | translate your inputs, how do you know if or why the model has
         | generated, without being commanded to do so, a campy American
         | gay male sociolect, or an African American regional accent, or
         | some other thing that may convey unintended meaning to native
         | listeners?
        
       | apwell23 wrote:
       | .
        
         | jvolkman wrote:
         | The Google Translate app has a conversation mode.
        
       | wg0 wrote:
       | And just the other day StyleTTS[0].
       | 
       | Just text to speech has gone too far. Audio books would be mainly
       | generated on the fly like this?
       | 
       | I think some RPGs in some 5 years time might have something like
       | this:
       | 
       | - A text file that outlines characters and a lose plot/Story
       | line. Human written.
       | 
       | - 3D Mesh Generation based on character description via
       | Transformers based models. Auto generated.
       | 
       | - Dialogues for each NPC via LLM.
       | 
       | - This TTS engine again based on such models.
       | 
       | Result - almost unlimited replayability. Or even edit text file,
       | have a new world based on a new story line with characters having
       | different personas.
       | 
       | [0]. https://news.ycombinator.com/item?id=38335255
        
         | mpalmer wrote:
         | How has TTS gone too far?
        
           | wg0 wrote:
           | Came a long way, that is. From the days of let's say if I
           | recall correctly, from Windows 98 screen reader.
        
       | TheCaptain4815 wrote:
       | The demo is so much fun to use. I can't wait for all these
       | technologies to start integrating into filmmaking / games.
        
       | anonzzzies wrote:
       | How far from a real-time Star Trek translator? Whisper is fast
       | enough and light enough, LLMs are getting there, so it's close
       | isn't it?
        
         | Sol- wrote:
         | Seems like there will always be latency, because it's not
         | possible to easily stream over languages that have different
         | structure. You need to wait a bit before you can start
         | faithfully translating the meaning.
         | 
         | They also mention it in one of the videos about the streaming
         | variant of their translator. But I guess 2s delay or what they
         | mention is close enough for practical purposes.
         | 
         | I feel like for personal relationships where true real-time is
         | required, having a computer intermediary would be weird anyway
         | and you have to learn the language, at least for the time being
         | and as long as personal relationships are still relevant (in
         | the post-AI world they might not be).
        
           | forgot_old_user wrote:
           | > You need to wait a bit before you can start faithfully
           | translating the meaning
           | 
           | I guess it's possible that the AI learns about a specific
           | person over time? That way it can be confident about what's
           | being said as soon the person starts saying it
        
       | ziptron wrote:
       | If you are multilingual but have young children and plan to
       | continue residing in your current English speaking country for
       | the foreseeable future, are you opting to teach your children
       | those additional languages or are you adhering to the idea that
       | they can always learn those languages later if necessary,
       | considering it might not be essential (esp with models like
       | this)?
        
         | esafak wrote:
         | It is easier to learn multiple languages when you are young.
        
           | robga wrote:
           | There isn't a lot of good evidence behind this popular
           | conception.
           | 
           | If anything, the evidence is that it isn't true, see https://
           | journals.plos.org/plosone/article?id=10.1371/journal...
           | 
           | Any apparent causality of age of acquisition seems to be a
           | proxy of hours of exposure. It may well be that it is easier
           | for young people to rack up a lot of exposure to a second
           | language, but not much evidence that age plays much of a
           | factor for people of different ages who had the same degree
           | of exposure.
        
             | debugnik wrote:
             | > we argue that the late learners resort to computationally
             | less efficient processing strategies when confronted with
             | (lexically determined) syntactic constructions different
             | from the L1.
             | 
             | > we show that the ERP signal in response to grammatical
             | violations depends on the AoA of an L2 learner, as well as
             | on the regularity of the structure under investigation. In
             | (lexically determined) syntactic constructions different
             | from the L1, we found a gradual change in processing
             | strategies that varies by AoA, with a native-like effect
             | for early learners and a less efficient neural processing
             | strategy for later starters.
             | 
             | Although they do clarify that these effects _could_ be
             | confounded with age of acquisition instead of it being the
             | cause.
        
       | navbaker wrote:
       | Seamless Streaming looks really promising! We just had a new
       | employee start a few months back with profound hearing loss and
       | our company had no idea what to do with him from an accessibility
       | standpoint. They threw out solutions like Dragon, not realizing
       | those solutions are not real-time.
       | 
       | He ended up rolling his own solution by standing up Whisper in
       | one of our clusters and writing a basic front end and API to take
       | his laptop's mic input and chunk it every few seconds to send to
       | the model and get back text in pseudo-realtime. We got him a
       | pretty beefy Alienware so he wouldn't be tied to the cluster
       | GPUs. I can't wait to see what he does with these new models!
        
         | cgb223 wrote:
         | Just wanted to say you're a great employer to be so incredibly
         | accommodating to the point you get them an Alienware and let
         | them roll an accessibility solution
         | 
         | We need more support for employees like this!
        
           | cced wrote:
           | Second this!
           | 
           | Also, what about Apple's latest M3 series chips? Are this in
           | the same realm as Alienware in terms of AI compute?
        
             | jackson1442 wrote:
             | I think generally the consensus of Apple Silicon is that
             | they're great _for a laptop_, but still aren't going to
             | beat a dedicated graphics card + high-end CPU like i9/Ryzen
             | 9. Biggest thing going for apple is the performance/watt
             | though which is critical for a laptop.
        
               | cjbprime wrote:
               | I think this is missing the main reason to use Apple
               | Silicon, which is that your dedicated graphics card
               | probably has 24GB or less of RAM, whereas e.g. an M2
               | Ultra Mac Studio can have 192GB of RAM with a far
               | superior memory bandwidth to anything on x86. This is
               | important because even a "small" LLM like Llama2 13B
               | would require quantization to fit in the 24GB RAM that
               | the dedicated graphics card will give you, whereas the
               | Mac could run Llama2 70B without quantization (at FP16).
        
               | aftbit wrote:
               | Whisper doesn't need that much RAM though.
        
             | willy_k wrote:
             | They definitely are in terms of energy efficiency
        
             | nodja wrote:
             | They're better than most consumers x86 CPUs but worse than
             | using a GPU. Where they shine is when the ML model can't
             | fit the GPU's VRAM since you have better options for ram
             | size with macs.
        
           | romwell wrote:
           | >Just wanted to say you're a great employer to be so
           | incredibly accommodating to the point you get them an
           | Alienware
           | 
           | So gracious, to give a software developer some hardware to
           | run the software they _need to work_ , that costs a whopping
           | _nothing_ more than what other people in the industry get on
           | the average.
           | 
           | >and let them roll an accessibility solution
           | 
           | "You're such a good employer! You let your employee build
           | _their own_ accessibility ramp to the back entrance _in their
           | own time_ , and _even_ got them a mortar spatula to do so! "
           | We need more support for employees like this!
           | 
           | >We need more support for employees like this!
           | 
           | And less support for _employers_ like this.
        
             | Solvency wrote:
             | Not sure why you're being downvoted. Literally the
             | equivalent of building your own ramp.
        
               | freedomben wrote:
               | I didn't downvote, but I considered doing so because
               | nowhere that I saw in GP does it say _in his own time_ ,
               | and that's a critical piece of the equation.
               | Hallucinating that datum means they got the argument
               | wrong, and worse they were harshly critical of the
               | company based on that _wrongly assumed_ information.
               | 
               | It reminds me of the Homer Simpson quote, "I don't mind
               | being called a liar when I'm lying, or about to lie, or
               | just finished lying, but NOT WHEN I'M TELLING THE TRUTH!"
               | I would be equally critical if it was warranted, but when
               | it isn't it's deeply unfair to the accused.
               | 
               | If the person _wanted_ to build their own ramp, and the
               | employer let them do it on the clock, that 's a
               | completely different scenario than the employee having to
               | come in during their off-hours to build the ramp just so
               | they can go to work.
        
         | qkeast wrote:
         | Awesome! I love hearing about places making the effort to be
         | inclusive.
         | 
         | As someone who's profoundly deaf myself, another less technical
         | approach is to install Rogue Amoeba's Loopback, and use it to
         | pipe audio from a given app into a tool like Google Meet or
         | Otter.ai using the Loopback device as the audio source. This
         | effectively provides real time captions for anything running on
         | your existing machine.
        
           | tuukkah wrote:
           | Clever use of Google Meet as a tool! Also, Google Pixel
           | phones now provide realtime captions to any speech playing on
           | the phone (Accessibility > Live Caption). You can also choose
           | a "preferred language" and the captions will be automatically
           | translated to that language from other languages.
        
           | jallmann wrote:
           | Google Chrome [1] also has captioning built-in [2], so this
           | could also work from a plain page that hooks into the
           | loopback device. Pretty sure it's using the same TTS backend
           | that Google Meet uses.
           | 
           | The nice thing about Chrome feature is you can move the
           | caption box around and keep it in the foreground while doing
           | other things, although styling options seem limited (the text
           | might be a little small for some).
           | 
           | [1] on desktop, not sure about mobile
           | 
           | [2] via chrome://settings/accessibility -> Live Caption
        
           | romwell wrote:
           | >Awesome! I love hearing about places making the effort to be
           | inclusive.
           | 
           | The extent of the effort being getting their employee a
           | slightly-more-expensive-than-average tool that would enable
           | them to do their job better _regardless_ of the disability?
           | 
           | Such inclusive, much pat-yourself-on-the-back, wow.
           | 
           | "We gave our woodworking shop employee a quality saw so that
           | they'd make _their own_ accessibility ramps! "
        
             | callalex wrote:
             | What would you have them do instead?
        
             | qkeast wrote:
             | I have literally been told in job interviews that the
             | company would not be "allowed" to hire me because I'm
             | hearing impaired, so yes, making an effort to support an
             | employee's disability and their needs is worth recognizing.
        
             | RogerL wrote:
             | So what? Okay, in the case of a ramp, if you need one you
             | probably are going to have difficulty building one. So pay
             | employee Sally to build it instead, absolutely.
             | 
             | But hearing loss does not impair standing up servers and
             | software. They can pay the employee who probably is the
             | expert at this, the guy with the hearing loss, or go task
             | Emil to go do it to ... avoid 'appearances'?
        
         | pawelduda wrote:
         | That's very nice of you
        
           | romwell wrote:
           | >He ended up rolling his own solution
           | 
           | >That's very nice of you
           | 
           | ...doesn't compute.
           | 
           | What exactly was nice here?
        
             | diab0lic wrote:
             | > We got him a pretty beefy Alienware so he wouldn't be
             | tied to the cluster GPUs.
             | 
             | Probably this.
        
         | lovich wrote:
         | Y'all should turn that into a product, or at least open source
         | it and get the positive PR + helping others
        
           | FloatArtifact wrote:
           | > Y'all should turn that into a product, or at least open
           | source it and get the positive PR + helping others
           | 
           | There you go. https://github.com/dictation-toolbox/dragonfly
        
         | kylixz wrote:
         | I recommend checking out: https://talonvoice.com/
        
           | FloatArtifact wrote:
           | It's not open source nor does the author intend to open the
           | stack.
        
         | aftbit wrote:
         | Check out Willow! It does essentially this, using WebRTC. It
         | doesn't handle the near-real-time response yet, but it does
         | stream the audio to the server and the change would be pretty
         | minor.
        
           | FloatArtifact wrote:
           | > Check out Willow! It does essentially this, using WebRTC.
           | It doesn't handle the near-real-time response yet, but it
           | does stream the audio to the server and the change would be
           | pretty minor.
           | 
           | Simply voice to text is not what's needed for dictating
           | commands. Unless I can load commands of on the fly and decode
           | utterances that may be useful.
           | 
           | The client would need to be able to send its commands to the
           | server on the fly.
        
         | FloatArtifact wrote:
         | Problem with whisper is its not really optimized for command
         | recognition versus general dictation.
         | 
         | - Whisper processes 30 second audio chunks. So if you process 5
         | seconds of audio you have to pad it out with 25 seconds of
         | silence. Hence a loss of efficiency with wasted CPU / GPU
         | cycles on 25 seconds per chunk in the case above.
         | 
         | - Whisper most likely can't handle hundreds of commands much
         | less than a thousand performantly.
         | 
         | - Whisper doesn't handle short commands very well with a degree
         | of accuracy post processing commands from free dictation
         | utterances.
         | 
         | Command dictation should be weighted higher than general
         | dictation when decoding.
         | 
         | I work with a little under 1500 of commands dragon naturally
         | speaking. DNS is hot garbage as a program despite it has the
         | best accuracy to date with the feature of commands and
         | dictation in one utterance. You get to pay $750 for the
         | privilege m
         | 
         | I've yet to see a free and open source speech recognition
         | engine that can handle both dictation and commands with a high
         | degree of accuracy.
         | 
         | Please please let me know if there's alternatives out there. I
         | would definitely pay to support an open source project like
         | this that focuses on command and dictation.
         | 
         | Most solutions out there that are open source nowadays focus so
         | much on iot command recognition with intents. That's not well
         | suited for controlling your computer with grammars containing
         | voice commands.
        
           | novok wrote:
           | Is 30s the input size set by the model, or programs that wrap
           | the model? Is it how it's trained?
        
             | bakkoting wrote:
             | It's a property of the model itself.
             | 
             | > Input audio is split into 30-second chunks, converted
             | into a log-Mel spectrogram, and then passed into an
             | encoder.
             | 
             | https://openai.com/research/whisper
        
         | sagz wrote:
         | Do they need realtime transcription?
         | 
         | Computer: webcaptioner.com Android: Live Transcribe
         | (g.co/livetranscribe) iOS: Live Caption with the 'mic' icon
         | enabled.
         | 
         | Web conferencing: Meet, Zoom, Teams all support realtime CC,
         | which is pretty good.
        
       | londons_explore wrote:
       | Does "reduce toxic words" and "promoting safer communication"
       | mean that if you say something wrong about LGBTQIA+ people it
       | will 'correct' what you say?
       | 
       | I'm not sure I want the latest twitter trend to be involved in
       | the design of my translator...
        
         | jwineinger wrote:
         | Their video said it was to reduce toxic word hallucinations,
         | which does seem admirable/useful. I'm testing real-time
         | translation in a church setting, and I've witnessed whisper
         | hallucinating profanity, which is quite undesirable.
        
           | cgb223 wrote:
           | "Toxic word hallucination" would be a great punk rock band
           | name
        
           | kelseyfrog wrote:
           | It also happens to be quite hilarious.
        
         | mortimerp9 wrote:
         | Hi, I work on seamless. What this refers to is added toxicity
         | mitigation. We try to detect the level of toxicity in the input
         | and make sure that the output toxicity level is not higher.
         | This protects the model from doing egregious errors in the
         | translation.
         | 
         | There are more details in the paper if you want and the
         | mitigation code is all open source if you want to check what it
         | actually does.
        
           | Domenic_S wrote:
           | > _What this refers to is added toxicity mitigation._
           | 
           | Oh, well _that_ clears it up!  </snark>
           | 
           | I don't see any definition of 'toxicity' on the landing page
           | - it seems to be one of those 'I know it when I (hear) it'
           | kind of words... unless there's some widely-accepted
           | definition in this area of study?
        
             | mortimerp9 wrote:
             | Sorry if I wasn't clear, internally we've been talking
             | about it a lot, but I forgot that it doesn't have such a
             | solid definition outside of our work. Thankfully, we try to
             | define it in section 7.3 of the NLLB paper:
             | https://arxiv.org/pdf/2207.04672.pdf
             | 
             | The tldr is that if you say: "Thank you for this job
             | offer." you wouldn't want it to be (mis)translated as "Go
             | F*k yourself.". But if you do say "Go F yourself", you
             | still want it to be translated as that.
        
           | Reubend wrote:
           | That's an awesome feature. I think one of the worst possible
           | outcomes of machine translation is something that ends up
           | being accidentally offensive, and this is a smart way to
           | mitigate that.
        
             | fl7305 wrote:
             | > one of the worst possible outcomes of machine translation
             | is something that ends up being accidentally offensive
             | 
             | The Hitchhiker's Guide To The Galaxy claims the opposite:
             | 
             | "Meanwhile, the poor Babel fish, by effectively removing
             | all barriers to communication between different races and
             | cultures, has caused more and bloodier wars than anything
             | else in the history of creation."
        
             | SoftTalker wrote:
             | Or maybe we'll finally come around to the idea that being
             | offended by _words_ doesn 't make a lot of sense.
        
           | dontupvoteme wrote:
           | How do you account for colloquial (non-English) language
           | which could be naively misconstrued as toxic?
           | 
           | e.g. "geil" (either cool or horny depending on usage) in
           | German
           | 
           | It's not fundamentally different than e.g. "wicked" in
           | English, but the biggest bias that potentially all these ML
           | models exhibit is predisposition towards Anglophoneism
        
             | mortimerp9 wrote:
             | Our goal is to have a good recall, sometimes to the
             | detriment of precision, so for words with multiple
             | meanings, it might consider them toxic when in the actual
             | context they are used in, they are not. The toxicity
             | mitigation algorithm will search for alternative
             | translations that have the correct meaning but not the
             | potentially toxic word so that there is no added toxicity
             | in the output. This means that sometimes the model might
             | prefer a less coloquial phrasing than what a human would.
             | 
             | You can find details on how the multi-language creation of
             | the toxicity lists was done in section 7.3 of the NLLB
             | paper: https://arxiv.org/pdf/2207.04672.pdf. TLDR: it's not
             | just a translation of a base English list, even if we
             | started from that, each language has a curated list that
             | was built by professional translators.
        
               | dontupvoteme wrote:
               | That's significantly less myopic than I pessimistically
               | assumed. Thanks!
        
           | novok wrote:
           | Is there an ability to turn it off? If you're translating an
           | R rated movie with criminals who swear a lot, is it possible
           | to get non-toxic filtered output to make sure it's being
           | translated properly?
        
             | mortimerp9 wrote:
             | it only kicks-in if the output is more "toxic" than the
             | input. If the input has a lot of swear words and the output
             | has the same amount, then it will be left alone.
        
         | beardicus wrote:
         | the site makes it pretty clear in multiple places that they're
         | talking about "added" or "hallucinated" toxicity. maybe your
         | culture war outrage is misplaced?
        
           | Domenic_S wrote:
           | Ok so I know nothing about how this works. It seems like if
           | the model was able to properly detect words in the first
           | place, it would never hallucinate 'toxicity'; if it _can 't_
           | recognize the word with high probability, how will it know
           | whether the speaker actually said $toxicWord or whether it
           | should print something else?
           | 
           | Perhaps it's taking a Big List of Naughty Words and weighting
           | them so that the system must be "extra sure" that's what the
           | speaker said, or else fall back to a G-rated word?
        
             | numpad0 wrote:
             | Maybe it's for preventing unwarranted fucks[1]? Translation
             | is more than just concatenating dictionary definitions, and
             | machine translations routinely make this kind of out-of-
             | place and technically correct lookups.
             | 
             | 1: https://www.google.com/search?q=engrish+fucking+sign&tbm
             | =isc...
        
             | mortimerp9 wrote:
             | Meta employee here. The system is not perfect, or it would
             | not "hallucinate", while it's pretty good, it does sometime
             | make errors (not just hallucination, maybe just some
             | mistranslation due to noise in the training data). What we
             | want is to avoid these errors to introduce toxicity (think
             | swear words) that weren't in the input as this could be
             | very bad for the user. There is a separate system that
             | double checks the output (compared to the input) and tells
             | the translation model to try again if it's too bad.
        
         | madeofpalk wrote:
         | Your framing of basic respect as being a "twitter trend" is...
         | bizzare.
        
         | jadbox wrote:
         | Your comment seems to imply LGBTQIA+ is just a Twitter trend,
         | versus people's lived experience and lifelong identity. This is
         | as unnecessarily judgment as small identities claiming that
         | straight people must self-identify cis.
         | 
         | There is no moral superiority to deny or force label other
         | people's identities. You're an attack helicopter? Great, roger
         | dodger, let's go get coffee Seahawk.
         | 
         | No one is seriously asking for litter boxes in school bathrooms
         | or helicopter refueling stations.
        
           | mpalmer wrote:
           | > No one is seriously asking for litter boxes in school
           | bathrooms or helicopter refueling stations.
           | 
           | This feels a bit out-of-nowhere.
           | 
           | My read on parent comment was that "Twitter trends" are fast-
           | changing norms about what language is (un)acceptable. They
           | were not saying that LGBTQIA+ identity itself is a trend.
        
             | jadbox wrote:
             | Perhaps so. In light of yesterday's Russia announcement for
             | labeling the "international LGBT public movement" as terror
             | extremists, I think we should be careful what we label as
             | fads or (worse) insidious activity. Source:
             | https://www.themoscowtimes.com/2023/11/30/russia-bans-
             | intern...
        
               | mpalmer wrote:
               | You seem to me to be arguing against points no one is
               | making. You're taking the word "trend" and extrapolating
               | it to "fad" and "insidious activity" - both of which have
               | very different meanings and connotations to the phrase
               | "Twitter trend".
               | 
               | The original comment you replied to made the point that
               | they don't want their own personal expression curtailed
               | or modified according to someone else's opinion of
               | acceptable speech.
               | 
               | As someone who repudiates Russia's policies, I support
               | and agree with their point.
        
         | sjbase wrote:
         | > Please don't use Hacker News for political or ideological
         | battle. That tramples curiosity.
         | 
         | From the hackernews guidelines
        
       | zengid wrote:
       | If "toxic word hallucinations" isn't a cyberpunk phrase I don't
       | know what is.
       | 
       | (quote from the video presentation in the link)
        
         | spacephysics wrote:
         | Oh god they're gonna censor the output. Time for musk to make a
         | non-censored version lol...
        
         | drexlspivey wrote:
         | I am sorry Dave, "merde" is not in the pre-approved word list
        
         | dontupvoteme wrote:
         | I wonder if it doesn't understand the common colloquial usage
         | of "geil" in German. This sounds like it is going to mess up
         | natural language
        
       | troseph wrote:
       | I feel like naming something "seamless" is not dissimilar to
       | calling the Titanic unsinkable.
        
       | bsza wrote:
       | "We need access to your microphone and camera to record your
       | voice and translate it with your expressions."
       | 
       | None of the videos shows any modified/lip-synced footage. There
       | doesn't seem to be a reason for this thing to need access to my
       | camera.
       | 
       | Also, using it with tape over the camera doesn't seem to work
       | either. (Perhaps it needs to see facial expressions in order to
       | work?)
        
       | Havoc wrote:
       | Can this also do straight tts or is it translation only? Is t
       | quite clear to me from the site
        
       | tambourine_man wrote:
       | Every video in this page is a bit out of sync with the audio.
       | Combined with the blandness of feature expressions and the whole
       | mood in general, I kept waiting for the moment when the video
       | would disclosure that everything on it was created by AI.
        
       | nextworddev wrote:
       | RiP elevenlabs?
        
       | Reubend wrote:
       | Wow, after trying out the demo, I'm floored by how high quality
       | this is. The translations worked perfectly, the voice cloning was
       | "good enough", and the emotions conveyed in my voice was retained
       | pretty accurately.
       | 
       | I don't think this would fool anyone that I was a real native
       | speaker of the target language, but for casual conversation this
       | would work pretty much perfectly. It basically avoids all of the
       | traditional pitfalls of machine translation, like the unnatural
       | robotic voice that it outputs, the slow translation speed and
       | huge latency for realtime conversation, and the loss of emotion.
        
       | stephc_int13 wrote:
       | As a French native speaker, I am surprised by the low quality
       | (frankly ridiculous) voice of the French translation example.
       | 
       | Especially because the head of AI at Meta is a French guy AFAIK
       | (Yann Lecun).
        
         | sangnoir wrote:
         | They are optimizing for speed (low latency)
        
       | yread wrote:
       | Does the spanish expressive sample sound muffled for others too?
       | And the french sounds super mechanical. Hopefully, it's more
       | impressive the other way.
       | 
       | Also: "This research demo is not open to residents of, or those
       | accessing the demo from, the States of Illinois or Texas"
        
         | dentalperson wrote:
         | Yes, they all have significant 'ghosting' artifacts where the
         | harmonics are a bit fuzzy if you listen closely. AFAIK all of
         | the recent neural speech engines have this, from SoundStream to
         | EnCodec, especially in low latency causal setups. Wavenet was a
         | bit better in that regard but has fallen out of style due to
         | complexity and the lack of a bottleneck. It seems like
         | something diffusion post processing would be able to clean up.
        
         | TacticalCoder wrote:
         | The "expressive" example in french exhibits a _thick_ accent
         | which bothers me more than the mechanical aspect of the non-
         | expressive french example.
         | 
         | It's not dissimilar to some kind of a "ch'ti" / "chtimi" accent
         | or a belgian-french accent (which is not dissimilar to the
         | french ch'ti accent, heard in some part of the north of France.
         | "Ne partez pooooo" (with a longer "a" which sounds nearly like
         | an 'o': that's not proper french at all) instead of "Ne partez
         | pas".
         | 
         | That's said I'll take the non-expressive accent any day over
         | subtitles for when watching video in a language I don't
         | understand: it's clearly good enough.
        
         | grogenaut wrote:
         | Illinois is possibly because they don't allow storage of
         | biometric data without express permission and I believe
         | explicit usage restrictions. So I bet they're keeping all of
         | your utterances, which would violate that law.
        
       | iFire wrote:
       | LICENSE
       | 
       | Attribution-NonCommercial 4.0 International
       | 
       | https://github.com/facebookresearch/seamless_communication/b...
        
         | iFire wrote:
         | Took me 2 minutes to find the Github.
        
       | nathanfig wrote:
       | Impressive work, really excited for this.
       | 
       | I will note though that I feel safer getting an occasional bad
       | word than I do having a translator straight up deceive me.
       | 
       | For example, "what the fuck" in English->Spanish is giving "que
       | diablos" output. Definitely toning down the meaning there.
       | 
       | If someone says something mean to me, I want to know it.
        
         | jonathanlb wrote:
         | This may be an intentional decision given that there are
         | several ways to say "what the fuck" in Spanish, such as "que
         | mierda" or "que carajos". And that's not including regional
         | expressions like "que cono" or "que chingados". So, saying "que
         | diablos" may be the most common expression across dialects
         | conveying the same meaning.
        
           | nathanfig wrote:
           | Yeah could be, I still need to read the paper to better
           | understand the safety tuning.
           | 
           | Would be interesting to see some work stress-testing the
           | ability to convey ill-intent across multiple languages.
           | Accurately conveying ill-intent is safety-critical for the
           | person being threatened.
        
       | trinovantes wrote:
       | Currently Steam bans games from using AI-generated assets (for
       | good reason). I wonder if they'll back track on this or carve
       | exceptions because this tech seems really useful for indie devs
       | to add voice work to their otherwise silent games.
        
         | yjftsjthsd-h wrote:
         | Very speculative amateur opinion: My understanding is that
         | Valve didn't exactly ban AI, they banned AI that was fed
         | copyrighted works that could possibly make the results
         | copyright infringement (
         | https://www.theverge.com/2023/7/1/23781339/valve-steam-ai-ar...
         | ). (Side note: Regardless of individual views on whether AIs
         | are just copyright regurgitaters or not, I can understand Valve
         | being cautious until courts have actually decided.) So _if_
         | speech models can be made purely from assets that their
         | creators can prove they have the rights to use, it would
         | probably be easy enough to get it approved.
        
       | ChuckMcM wrote:
       | I look forward to the day where I'm wearing my headphones in a
       | foreign land and hearing all of the discussions in my own
       | language.
       | 
       | The "universal translator" which was part of Star Trek and a lot
       | of other Sci-Fi I was exposed to as a kid was something I was
       | really fascinated with. My Dad worked as a simultaneous
       | French->English translator and sadly spent long hours away from
       | home and, as a kid, I started trying to build a translator so
       | that it could do his work and he could be home more.
       | 
       | Translation is important work and one that could help a lot of
       | people. It's my hope that we get to the point where these models
       | work entirely on locally carried resources.
        
         | sacvnsune wrote:
         | If I am not wrong, Google Pixel buds offer live translate
         | feature.
        
           | echelon wrote:
           | Not in the voice of the original speaker.
        
             | stevenicr wrote:
             | now if I could just get the pixel buds tech to remove the
             | voice of the original speaker and translate some youtube
             | videos from thick accent english into no accent am-english.
        
               | ChuckMcM wrote:
               | This is a really interesting use case. I could definitely
               | see this as a service for content providers to get more
               | reach and I think you could justify a subscription price
               | for the service based on this.
               | 
               | By keeping creating speaker specific tonal ranges and
               | profiles you maintain the better cohesion on the final
               | product.
        
               | keerthiko wrote:
               | Obligatory, not directed at you in particular since I'm
               | sure you mean no offense, but just voicing a pet peeve:
               | 
               | I grew up bilingual outside the US, and speak English
               | with a hybrid British/Indian/Middle Eastern accent (with
               | some of my personal quirks, and mixing increasing amounts
               | of various American accents over time). I can understand
               | English in nearly any accent (Singaporean, Chinese,
               | Vietnamese, Indian, Nigerian, eastern European) as long
               | as the words involved are globally used and the grammar
               | is passably queen's. Especially after hearing it for
               | about an hour. And people who natively speak English with
               | these various accents usually can understand my English
               | better than they can an average American accent. Yet in
               | this country, my accent is belittled, despite being
               | perfectly understood and more versatile. Even by others
               | who don't speak with the American accent!
               | 
               | This is the problem of the "default accent" anywhere
               | being referred to as "no accent", and therefore anything
               | deviating is considered "having an accent". This makes
               | "accent" a negative trait, scaling from 0-bad to heavy-
               | bad. But if the vernacular were such that we said
               | "American accent" instead of "no accent", then noone's
               | accent is bad, just not used to.
               | 
               | Most of my non-American peers who were raised on English
               | have a better command of the language than my American
               | ones, yet they are mocked for their accents as if they
               | don't know the language, when in reality it's the
               | Americans lack of familiarity with the language (as its
               | used globally) preventing them from comprehending the
               | language.
               | 
               | So yes, put in more work, the world is shrinking and
               | English is the global language (for better or worse).
               | What you're saying is spoken from a position of privilege
               | because the culture allows you to mock others' accents
               | and imply your version of it is the correct one that
               | everyone else should put in work to provide you with,
               | rather than the other way around.
               | 
               | Every time you hear English with an accent other than
               | British, American or Australian, remember that it usually
               | means the speaker knows at least one entire other
               | language as well, probably one that you would sound like
               | an idiot if you tried to speak it. Don't be rude or
               | dismissive of their command of English.
               | 
               | In fact, you were so close -- you called it a "no accent
               | am-english", when you could have just called it what it
               | is -- "an american accent".
        
               | freedomben wrote:
               | I'm not OP, but doing what you did is a pet peeve of
               | _mine_ :
               | 
               | > _What you 're saying is spoken from a position of
               | privilege because the culture allows you to mock others'
               | accents and imply your version of it is the correct one
               | that everyone else should put in work to provide you
               | with, rather than the other way around._
               | 
               | > _Every time you hear English with an accent other than
               | British, American or Australian, remember that it usually
               | means the speaker knows at least one entire other
               | language as well, probably one that you would sound like
               | an idiot if you tried to speak it. Don 't be rude or
               | dismissive of their command of English._
               | 
               | This is so uncharitable an interpretation of GP that it
               | makes me wonder if it's Poe's Law at play and you're
               | actually trolling. Nevertheless, I will assume you are
               | being serious and address your comments as such.
               | 
               | You clearly have some deeply held frustrations (at a
               | minimum), but unless you have a history with GP and
               | therefore a _lot_ more context on them than I do from
               | just reading these comments, or unless GP edited their
               | post in between my writing this and reading yours, then
               | you are majorly projecting upon them based purely upon a
               | negative stereotypes that you harbor against Americans.
               | If I 've missed the mocking or rude dismissiveness you
               | refer to, then please point it out with a direct quote so
               | I can further examine what you are referring to.
               | 
               | There definitely are people (and definitely some
               | Americans, though it's certainly not monopolized by them.
               | I was once ridiculed by locals in Mexico City for my
               | terrible Spanish) who "mock" accents and are generally
               | assholes who don't appreciate the difficulty of speaking
               | a non-native language, and many of them would deserve the
               | criticism you've levelled at GP, but to unload those
               | accusations and chastisement at a person without cause, I
               | don't think you're behaving any better than the people
               | you would criticize who.
        
               | archagon wrote:
               | I don't think it's unreasonable to remind people that a
               | "default" accent does not exist, and that AI-editing an
               | accent out starts to feel a bit like dystopian identity
               | erasure and homogenization. Even if we scope ourselves to
               | Americans speaking English as a first language, there are
               | dozens of diverse accents across the country.
        
               | ChuckMcM wrote:
               | I think this is one of those times when my Mom,
               | understanding my desire to be understood and to ask
               | questions about motives and related understanding, would
               | observe the, oblivious to me, effect of inflaming the
               | conversation and say, "Charles, this is not the time."
               | :-)
        
               | archagon wrote:
               | I don't like seeing a comment that's relatively
               | reasonable get greyed out just because it grinds
               | somebody's gears. Alas, I only have one counter-downvote
               | to give, so I feel obliged to comment.
        
               | stevenicr wrote:
               | My original statement was wanting a translator device,
               | hardware or software, so I could understand and learn
               | better.
               | 
               | There was not desire for identity erasure or
               | homogenization, leave whoever's voice the way it is
               | online, give me an option to translate it. I added more
               | about my issue downthread.
               | 
               | Diverse accents across the country. - absolutely! which
               | is why I said 'no accent am-english.' (for me, as I can't
               | learn well outside that) - and assuming if this tech
               | exists it could help me, and perhaps be tweaked to change
               | to other accents for other people.. also mentioned in
               | downthread reply.
        
               | stevenicr wrote:
               | I appreciate your sharing, and stating that you assume I
               | meant no offense, and that your thoughts are not directed
               | at me specifically.
               | 
               | I could of been more specific, but my request for the
               | tech to vary, I think would lead to specific options for
               | different people.
               | 
               | And actually to be even more.. not sure the word.. I want
               | 'the Chicago accent' I think it's called, or midwest / no
               | accent. Personally as much as I enjoy some entertainment
               | from Jersy / NY accents, I would not volunteer to watch
               | tutorials on tech taught by the Sopranos cast - as funny
               | as that might be (and I get if you are from the NE, you
               | may be learning just fine being taught with such a
               | language style).
               | 
               | As annoying some of the Cali style of language is, I can
               | understand the words and meanings without squinting my
               | ears and spending double the brain cycles trying to
               | understand the words, while then interpreting the
               | meaning, and then trying to put together concepts for
               | understanding new ways of coding or using tech.
               | 
               | I've run into folks in Louisana that I could not
               | understand at all and had to ask for an interpreter at a
               | gas station. From Florida to Chicago to Seattle down to
               | Miss and Ala - I can hear what people are saying and
               | learn without spending lots of extra energy trying to
               | understand.
               | 
               | With that being said, I understand there are parts around
               | Miami where accents may be thicker (or not) - and with
               | some folks even if using the rights words and grammar, I
               | may need to slow down the speech to actually learn if
               | they were teaching a class.
               | 
               | The slow down and speed up options already exist with
               | youtube.
               | 
               | "So yes, put in more work"
               | 
               | - I do try a bit. I don't mind accents with some folks
               | and media.For example I can listen to and enjoy Shankar
               | sharing via the 'hidden brain' series, partially because
               | his accent is limited but also because the media requires
               | less thought intensity.
               | 
               | I have tried many youtubes, and bought a few courses
               | taught from folks in India and other places where I just
               | could not muster the energy. I literally squint with my
               | ears and feel like my head gets hot trying to decipher
               | what is being said, translate into what is meant, and how
               | it should create new patterns of understanding in my
               | brain.
               | 
               | I can only do that for so long and I am done. Now I just
               | skip any learning video that has non-am English speakers.
               | When I consider courses to sign up for or buy, I have to
               | research the authors / speakers and find video of them to
               | hear the audio, because I just can't learn well that way.
               | 
               | "other than British," - True story, a few years ago I had
               | to call an ISP in Britain(?) and the person I got to to
               | file an issue with, I could not understand them. I had
               | ask 'what did you just say' many times. I laughed at
               | myself for even thinking of saying 'can you slow down and
               | speak clearer English please' - I mean, crazy... I was
               | paying by the minute for the long distance at the time
               | and it ended up being a 25 minute call that could of been
               | 10 if I had a magic translate without accent device.
               | 
               | "a position of privilege because the culture allows you
               | to mock others' accents"
               | 
               | - This is truly not about mocking accents, this is truly
               | about my lack of ability to learn well.
               | 
               | Yes, I would defintely sound like an idiot trying to
               | speak another language. Like I said, I do not learn as
               | well as some others.
               | 
               | Truly not my intent to be rude. I apologize if the
               | shortness came off that way, I was trying to be brief in
               | the hope that there's a chance that some tech like this
               | exists and someone here could point me to it. Before I
               | posted, I DDG'ed it and found a couple of things
               | attempting to be in that space with a 'speak to sales'
               | type of 'you'll never afford this' button for info.
               | 
               | I will never be dismissive of anyone's command of
               | English, or other spoken language, or computer language
               | or anything like that. There is no way for me to know
               | someone else's situation and circumstances led them to
               | their current command of whatever language. If someone is
               | trying to learn more at any age; I applaud and encourage
               | them - being rude or dismissive does not encourage more
               | learning.
               | 
               | "no accent am-english", when you could have just called
               | it what it is -- "an american accent". - Well maybe, but
               | actually I meant to be more specific, as mentioned a bit
               | above - I mean '"no accent" American accent' - because
               | there are plenty 'American accent' types that I would
               | want removed by a magic earpiece to make it easier for me
               | to understand and learn.
        
               | keerthiko wrote:
               | I appreciate the thoughtful reply. I don't think you're
               | rude, and I get what you're saying as someone who thinks
               | a lot about accents and languages. However, I still think
               | you missed my point.
               | 
               | There is no "no accent". An accent is a baseline feature
               | of intelligible human speech, like a voice, or a volume,
               | or a language. You can't say stuff without those
               | features. When you say "the Chicago accent", or the
               | "Midwest accent", that's an accent! Not "no accent".
               | 
               | I understand it's common usage to refer to the default
               | "radio accent" as "no accent", but in a country like
               | America, all kinds of people with all kinds of accents
               | speak English. Reinforcing an expectation that a certain
               | (usu. majority-white-spoken) one is the "default" by
               | referring to it as "no accent", implicitly suggests all
               | others are erroneous affectations, even if I trust that
               | is not your personal intent.
               | 
               | All that said, I think your idea for a translation device
               | capable of revocalizing what is said with an unfamiliar
               | accent into one you are used to is not a bad one, and
               | likely easier than translating between languages while
               | retaining expressiveness.
        
         | TheHumanist wrote:
         | Babel Fish
        
         | dimitrios1 wrote:
         | Another lesson we can learn from Sci-Fi is very often different
         | species on a planet would have their tribal / local languages
         | and dialects but all spoke a common tongue. I think this is the
         | more humanizing approach, rather than delegate even more of our
         | fleshly processing power to machines.
        
           | somewhereoutth wrote:
           | This seems to be what is happening in Europe (and perhaps
           | more generally across the globe), with English being the
           | common tongue.
           | 
           | Question is, what will happen to the tribal / local
           | languages? Will they survive?
        
             | Cthulhu_ wrote:
             | It varies. A lot of local languages have gone extinct
             | already. There's linguists hard at work to try and document
             | / record dying languages, but it won't be the same as
             | living the language from childhood.
        
           | micromacrofoot wrote:
           | then of course, there's always Darmok and Jalad at Tanagra
        
         | rangestransform wrote:
         | how am i supposed to talk shit with my friends about other
         | people in public then
        
           | flanbiscuit wrote:
           | I'm curious to know how well these models can pick up slang.
           | Maybe if you talk shit in as thick a slang as you can it
           | won't be able to give a good enough translation.
        
             | kredd wrote:
             | With my bi/trilingual friends who speak the same languages,
             | we intermix them to make our point more clear. Don't think
             | models will be good enough for mixes for a few more years,
             | so we're safe!
        
               | smcin wrote:
               | Can you show us an example of such a sentence?
        
               | kredd wrote:
               | Hm, think of things like "On va bruncher" (we're going to
               | brunch). The word "brunch" doesn't exist in french, but
               | we add suffixes to fit into the sentence. Very common in
               | Montreal. My french isn't very good to do that on the
               | fly, but my francophone friends do that all the time.
               | 
               | In my other languages that I am actually fluent in, it's
               | kinda the same -- you use specific suffixes to soften or
               | embolden your point and so on. Maybe add "exclamation
               | making sounds in specific language" too. Eventually your
               | nouns and verbs end up in different languages, with
               | different suffixes where it "makes sense", yet the person
               | whom you're talking to will "get it".
               | 
               | Would be curious to try the new Seamless model on such
               | speeches.
        
               | bertil wrote:
               | This is extremely common for every new technology:
               | "upload," "download," "stream," "google," "FaceTime,"
               | most code patterns, all the new ML apps, "venmo" or
               | whatever the name of the app you use for payment, etc.
               | all of those are taken as is, slapped a verb termination
               | and it's good enough. That's true in German, Danish,
               | Dutch, French, Italian, and Spanish.
               | 
               | The only thing that doesn't work is if you talk to people
               | too young to remember Skype. Then you feel old.
        
             | dontupvoteme wrote:
             | I'd love to see a map of how it matches up to regional
             | English/British accents and their slang.
        
             | fasquoika wrote:
             | Reinventing polari is certainly one way to make yourself
             | less understood...
        
           | ugh123 wrote:
           | learn Klingon?
        
             | bertil wrote:
             | Klingon is definitely going to be in the top 50 languages
             | covered...
        
           | csa wrote:
           | Speak in metaphor and/or code.
           | 
           | I've been in mixed language communities in which I wasn't
           | sure who spoke what, and I have found this to be quite
           | effective when done right.
           | 
           | Good time to reference st:ng "darmok" episode and quotes like
           | "darmok and jalad at tanagra".
        
           | buryat wrote:
           | get better at double speak
           | https://en.wikipedia.org/wiki/Doublespeak
        
         | baby wrote:
         | I'm wearing the Rayban Meta right now and they are already mind
         | blowing, I can already talk to that Meta AI assistant
         | seamlessly. I bet one of the future iteration will have exactly
         | this.
        
           | figers wrote:
           | Curious, what do you ask it besides take a picture / video or
           | what's the weather?
           | 
           | I have a pair and have only asked it that so far...
        
         | diob wrote:
         | The problem is you need a full sentence, plus surrounding
         | sentences to properly translate a lot of things (aka context
         | matters).
         | 
         | So no matter what, conversations in your native speech would
         | have to be delayed before translation.
        
           | ChuckMcM wrote:
           | I think I could adapt to that. But it would be an interesting
           | experiment.
        
           | ItsMattyG wrote:
           | My understanding is that they trained a separate model to
           | specifically estimate when they have enough context to begin
           | translating, as a skilled translator would.
        
           | DigiDigiorno wrote:
           | Even the native original version needs the proper context.
           | Sometimes you need the entire sentence to figure out what the
           | sentence was really about.
           | 
           | I'm reminded of Mark Twain complaining about verbs arriving
           | at the very end of sentencess in German (among a myriad of
           | other complaints)
           | 
           | "The Awful German Language* -Mark Twain
           | https://faculty.georgetown.edu/jod/texts/twain.german.html
        
             | scotty79 wrote:
             | Sometimes you even need a second sentence of even a few to
             | understand what the first sentence was about.
        
           | sexy_seedbox wrote:
           | So then we need something like neuralink to get the whole
           | thought from one's brain first, then the sentences are
           | processed properly for the context, then translated before
           | the speech is delivered.
        
         | freetanga wrote:
         | What most people have to say is not that interesting, and tech
         | won't change that
        
       | btbuildem wrote:
       | The near-realtime aspect of this is so promising -- we're getting
       | closer and closer to IRL babelfish!
       | 
       | What I would love to see is an ability to add my own voice (yes,
       | at the risk of deepfakes) so that the model could "speak" in any
       | language and sound more like me, not some random voice actor it
       | was trained on.
        
       | gagabity wrote:
       | Can this do speech to text English -> English? Get strange
       | results if I do a translation to the same language would be an
       | interesting alternative to Whisper if it could.
        
       | I_am_tiberius wrote:
       | I hope all these AI products will have privacy focused
       | alternatives quicker than when web2 happened.
        
       | mkagenius wrote:
       | Yet again, Hindi (the major language in India) is not even in the
       | samples. India is the largest user base of facebook (and probably
       | 1/3rd of the engineers working there are Indians) but never will
       | facebook put enough effort to contribute back. Only use the DAU
       | from India in investor calls.
        
         | cafed00d wrote:
         | By "samples" do you mean examples on the marketing/landing
         | page? It sure looks like the model supports many major Indian
         | languages like Telugu, Tamil & Kannada.
         | https://huggingface.co/facebook/seamless-m4t-v2-large
         | 
         | Yeah, I kinda agree with the spirit of your comment; it sure
         | would be nice to see a major Indian language like Telugu on
         | their landing page for sure. But that's just my Indian-person
         | bias speaking.
        
           | mkagenius wrote:
           | The lack of focus shows up in the results. The models never
           | performs as good as french or spanish on Indian languages.
           | This goes for Google, too.
        
       | gorbypark wrote:
       | I've been trying (and mostly failing) at settings up a pipeline
       | to get system audio into whisper and feed that transcription into
       | a seamless m4t text-to-text translation model. It seems like
       | seamless streaming is going to solve most of my issues, and
       | should significantly reduce latency!
       | 
       | My ultimate goal is to have realtime translations of video
       | conferences. I've moved to a new country, and while I'm super
       | privileged that most of my colleagues speak English, we still
       | have a number of "all hands" meetings that I get lost in pretty
       | easily.
        
       | xnx wrote:
       | This tech from Google seems similar, but doesn't have a fancy
       | demo: https://blog.research.google/2023/12/unsupervised-speech-
       | to-...
        
       | jwineinger wrote:
       | Any ideas on what kind of hardware this would require to run
       | S2ST?
        
       | gloyoyo wrote:
       | This is so world changing! Exactly how I wanted to speak so
       | confidently!
       | 
       | Thank you Meta!
        
       | mightytravels wrote:
       | Like how easy it is to get going but you need to download about
       | 20GB and s2st needs 40GB GPU RAM!
       | 
       | It runs but any audio input (you will need to provide wav not
       | mp3's) I tried (tried 20s/40s/300s) I get just one short sentence
       | returned in target language that seems not related at all to my
       | audio input (i.e. Tous les humains sont crees egaux).
       | 
       | Seems like some default text but it runs on full GPU for 10
       | minutes. Tons of bug reports in GitHub as well.
       | 
       | Text Translate works but not sure what is the context length of
       | the model. Seems short at first glance (haven't looked into it).
       | 
       | Oh and why is Whisper a dependency? Seems not need if FB has
       | their own model?
        
       | novok wrote:
       | I wonder how well this will perform for automatic comic's
       | translation. Current local models are pretty bad.
        
       | MagicMoonlight wrote:
       | >Automatically filters out toxic speech >Watermarking
       | 
       | So it can't be trusted at all then
        
       | quickthrower2 wrote:
       | How did that page get camera access without my permission?
       | 
       | Edit: by the upvote I guess it wasn't just me?
        
       | rammer wrote:
       | Marketing has been heavily involved in this page...there's at
       | least one coloured person for every white photo..
        
       | asylteltine wrote:
       | It really sucks that a company so irresponsible with all your
       | data is one of the leading AI companies now.
        
       | bozhark wrote:
       | I want this as a channel in our discord.
       | 
       | Would allow more interactions of people that don't speak the same
       | language
        
       ___________________________________________________________________
       (page generated 2023-12-01 23:00 UTC)