[HN Gopher] Mozilla releases local machine translation tools as ...
       ___________________________________________________________________
        
       Mozilla releases local machine translation tools as part of Project
       Bergamot
        
       Author : Vinnl
       Score  : 355 points
       Date   : 2022-06-02 16:23 UTC (6 hours ago)
        
 (HTM) web link (blog.mozilla.org)
 (TXT) w3m dump (blog.mozilla.org)
        
       | boberoni wrote:
       | For i18n in my own projects, I typically use tools like gettext
       | and involves lots of volunteers to do the translations. I might
       | try out these neural machine translation tools to see how they
       | fare. I also wonder if these machine translation tools are
       | trained on a corpus of gettext datasets.
        
       | bogwog wrote:
       | This is awesome, but...
       | 
       | > This set of requirements posed a number of technological
       | challenges to the team: the translation engine was entirely
       | written in programming languages that compile to native code. We
       | needed a way to streamline the distribution of the project in
       | order to avoid the overhead involved in providing builds
       | compatible with all platforms supported by Firefox -- that would
       | be impracticable to scale and maintain.
       | 
       | Does Firefox really support so many different platforms and archs
       | that CI builds are unrealistic?
        
         | jelmervdl wrote:
         | The upside of using WASM is that the extension itself can be
         | easily ported to other browsers and platforms. The UI uses
         | Firefox specific APIs but the parts that take the HTML from a
         | page and push it through the translation engine would also work
         | in any Chrome-based browser.
         | 
         | (Edit: also free sandboxing of a blob of C++ code that needs to
         | handle arbitrary input from the web!)
        
         | dblohm7 wrote:
         | (Former Mozilla employee, here)
         | 
         | I'm completely speculating, but it's probably a matter of not
         | wanting to complicate iterating on the translation engine by
         | introducing a bunch of cruft from the Firefox build system
         | (which, though it uses GNU make under the hood, is very much
         | bespoke and complicated).
         | 
         | Since the translation engine is intended to run on a product
         | that hosts WASM, they might as well just build to that.
        
       | whinvik wrote:
       | Can we use this on mobile?
        
         | jeroenhd wrote:
         | The demo site works on mobile if you let it load the necessary
         | content so if you're speaking from a web dev point of view:
         | definitely.
         | 
         | As for the addon, on Android you'll need to install an unstable
         | version of Firefox and configures a custom addon list in an
         | addons.mozilla.org account that includes it so you can download
         | it.
         | 
         | On iOS there isn't any option to download addons as far as I'm
         | aware. On mobile Linux environments everything should work like
         | on desktop.
        
           | djvdq wrote:
           | You can't download any addon for Firefox on iOS because it's
           | almost Safari, only looking a bit different. All browsers on
           | iOS has to use WebKit so FF is not really FF here on iOS.
        
           | jelmervdl wrote:
           | I think the Firefox extension might not work on mobile
           | because it hooks into some undocumented addon apis to draw
           | that translation bar UI. Those might not be available on
           | mobile.
           | 
           | The translation code itself should work on mobile. It's just
           | some javascript & wasm (albeit with SIMD instructions not
           | implemented in Safari's WASM vm...)
        
             | Vinnl wrote:
             | I just installed the extension on Fenix Nightly and indeed,
             | it does not work.
        
       | option wrote:
       | What's wrong with using cloud without sending any user id with
       | the request?
        
         | vanilla_nut wrote:
         | If local translation can help me use a website without a query
         | to some cloud server... who needs the cloud? No backend that
         | will experience downtime, and someday be decommissioned. No
         | money sink of cloud processing pressuring the product to
         | advertise or monetise in unscrupulous ways.
         | 
         | I'm sure cloud processing is better in many ways. But if this
         | is "good enough" I'd rather just do it all locally.
        
         | [deleted]
        
         | chungy wrote:
         | There's likely far more identifiable information in the actual
         | text than a user ID provides.
        
         | no_time wrote:
         | In the current status quo you either make use of an api by
         | indentifying yourself with a key or a browser session that is
         | fingerprintable in a gazillion ways. There is no such thing as
         | "not sending user ID" or if there is, it has a totally
         | negligible reach.
        
         | drewzero1 wrote:
         | Cloud assumes a constant, reliable internet connection, which
         | is not the reality in most of the world. (Nor is it always
         | desirable.)
        
         | jffry wrote:
         | If the data never leaves your device, then a third-party
         | service never gets the opportunity to leak or misuse it. This
         | is far more private.
         | 
         | How many stories have you heard about breaches due to
         | accidentally mis-configured logging in web services? Also in
         | the news lately was Twitter misusing 2fa phone numbers for
         | advertising purposes.
        
         | toper-centage wrote:
         | What if what I'm trying to translate is sensitive information
         | in itself?
        
         | kevin_thibedeau wrote:
         | "The telescreen received and transmitted simultaneously. Any
         | sound that Winston made, above the level of a very low whisper,
         | would be picked up by it; moreover, so long as he remained
         | within the field of vision which the metal plaque commanded, he
         | could be seen as well as heard. There was of course no way of
         | knowing whether you were being watched at any given moment."
        
           | 0des wrote:
           | "How often, or on what system, the Thought Police plugged in
           | on any individual wire was guesswork. It was even conceivable
           | that they watched everybody all the time."
        
       | no_time wrote:
       | This is incredible and super important. For all the blunders of
       | Mozilla in the last decade, they still have some great projects.
       | I am also grateful of them not scrapping common voice.
        
         | _trampeltier wrote:
         | Also important, because now it seems at least in germany, on
         | Google translate, there is the translate website button
         | missing. From Switzerland I saw the button lately when I tryed.
         | I don't know if it is because go to cencored (russian) sites.
         | My company blocks google translate anyway, probably because of
         | the same reason.
        
           | no_time wrote:
           | Try copying the url into the translator's text field. It's
           | how I've been using it for years.
        
           | croes wrote:
           | Are you sure?
           | 
           | Under Google Ubersetzer I see three button: Text, Dokumente,
           | Websites
        
         | [deleted]
        
         | coding123 wrote:
         | In the long run, I am a super huge fan of Mozilla and Firefox.
         | I am using it right now. After a 10 year stint of using Chrome
         | exclusively I now use Firefox as my main driver. Unfortunately
         | I still need to keep chrome around for weird situations where
         | the website developer only tested in Chrome (Yes this still
         | exists. A shopping cart in a popular website - cough Home Depot
         | cough cough - that recently failed me in Firefox worked in
         | Chrome. I haven't tried in a couple weeks hopefully that is
         | fixed.)
        
         | Shadonototra wrote:
         | it's not their project, all they did was to write a form in JS
         | 
         | the whole project is a EU funded one, all done in the
         | university of Edinburgh
         | 
         | https://cordis.europa.eu/project/id/825303
         | 
         | you giving full credit to Mozilla is dishonest, to say the
         | least
         | 
         | it aligns to their past projects, including using Mullvad and
         | slapping a Mozilla sticker on top of it to claim it as their
         | own
         | 
         | also it is super funny to read this:
         | 
         | > H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in
         | enabling and industrial technologies - Information and
         | Communication Technologies (ICT) MAIN PROGRAMME
         | 
         | little do they know, EU never learn
        
           | stuartd wrote:
           | All they did was 'write a form in JS'???
           | 
           | > Our solution to that was to develop a high-level API around
           | the machine translation engine, port it to WebAssembly, and
           | optimize the operations for matrix multiplication to run
           | efficiently on CPUs.
        
       | nyanpasu64 wrote:
       | It's unfortunate that it doesn't translate Japanese, and reading
       | Japanese-only resources is a common hurdle in the retro game
       | modding/development community.
        
       | simonmales wrote:
       | Great that you contribute your own language pairs.
        
       | obert wrote:
       | The sooner we move AI to 127.0.0.1 the better, enough with The
       | Cloud powerhouses.
       | 
       | Yes there's work to be done, resilience, power efficiency,
       | responsiveness, but it's the right direction for everything that
       | involves private computing.
        
       | Vinnl wrote:
       | I've been using the extension [1] for a bit and, while it doesn't
       | support too many languages, for the ones it does it's pretty cool
       | to have it all running locally.
       | 
       | [1] https://addons.mozilla.org/firefox/addon/firefox-
       | translation...
        
         | baobob wrote:
         | Do you know what the pipeline looks like for new language pairs
         | being added? This is really, really, really awesome
         | 
         | I'm also immediately curious about using it headless outside
         | the browser
        
           | jelmervdl wrote:
           | The training pipeline is also on Github! [1]
           | 
           | I was experimenting with running the wasm version of
           | bergamot-translator (the translation engine used by the
           | addon) in node [2].
           | 
           | However, if you want more performance, using the Python
           | library [3] or the native C++ interface [4] gets you further
           | because the wasm build is limited to a single thread and thus
           | a blocking interface, and can't use all the processor
           | specific optimisations that are in the native builds.
           | 
           | EDIT: Another option is using translateLocally [5], which is
           | a Qt desktop app based on bergamot-translator. It has a
           | native messaging API that is designed as a much faster
           | alternative to the wasm build for browser extensions, but it
           | can also be used from Python [6].
           | 
           | [1] https://github.com/mozilla/firefox-translations-training
           | 
           | [2] https://gist.github.com/jelmervdl/a4c8b6b92ad88a885e1cbd5
           | 1c6...
           | 
           | [3] https://colab.research.google.com/drive/1AHpgewVJBFaupwAb
           | Zq0...
           | 
           | [4] https://github.com/browsermt/bergamot-
           | translator/blob/main/a...
           | 
           | [5] https://github.com/XapaJIaMnu/translateLocally
           | 
           | [6] https://github.com/XapaJIaMnu/translateLocally/blob/maste
           | r/s...
        
         | clairity wrote:
         | neat, but it looks like it was just released, so how were you
         | using it before?
         | 
         | as an aside, pretty sad to see the project page,
         | https://browser.mt/ , requiring not just javascript but
         | specifically google connections to work. to 5 different google
         | properties, no less.
        
           | Vinnl wrote:
           | I work at Mozilla, so got a sneak preview (and also the first
           | bugs) :)
           | 
           | (Of course technically the work was out there in the open
           | already, since it's Mozilla.)
           | 
           | Agreed about the Bergamot website. I suspect it's not by
           | Mozilla, but I'll see if I can ask someone to take a look, as
           | I don't think all those connections should be necessary.
        
             | edko wrote:
             | Do you know if this will be open-sourced, or if the repo is
             | already available?
        
               | space_fountain wrote:
               | I think this is probably the source
               | https://github.com/mozilla/firefox-translations
               | 
               | edit: and for the actual translations
               | https://github.com/mozilla/bergamot-translator
        
             | clairity wrote:
             | awesome, thanks! i also suspect it's by the EU coalition
             | behind bergamot, so probably beyond mozilla's jurisdiction,
             | but it doesn't hurt to ask.
        
           | 0des wrote:
           | .mt huh thats a new one for me
        
       | lovelearning wrote:
       | Stuck at "Loading translation engine..." from a long time. Tried
       | German and Spanish. Can't tell if it's downloading some model
       | data or something's failed. I suggest some kind of progress
       | indicator.
        
         | ainar-g wrote:
         | Weird. I had a numeric progress indicator, and the model got
         | downloaded in just a couple of seconds.
        
         | Vinnl wrote:
         | You might want to report that here, if it's not reported
         | already: https://github.com/mozilla/firefox-translations/issues
        
       | jeroenhd wrote:
       | Looks lovely! Offline translations are very welcome in a world
       | where the most important translation engines are also run by the
       | world's biggest data hoarders.
       | 
       | Sadly, the extension either doesn't work on mobile or Mozilla
       | couldn't be bothered to add it to the whitelist.
        
       | andrenatal1 wrote:
       | Hi, I am part of the team who developed this and the author of
       | the article. You can ask me anything about it if you have
       | questions.
        
         | msdrigg wrote:
         | Chinese language is what I most commonly want to translate. Is
         | there any planned support for this?
        
         | unicornporn wrote:
         | Google Translate code is present on many web sites to provide
         | automatic translations of text. Could your translate code be
         | uploaded to a server and embedded in web page to provide the
         | same functionality?
        
           | jelmervdl wrote:
           | I'm not aware of any actively maintained projects that give
           | you this out of the box, but these two could be starting
           | points for such a project.
           | 
           | Mozilla implemented a REST service based on (an earlier
           | version of) bergamot-translator [1]. You could use that as a
           | replacement for the WASM component in the addon's code.
           | 
           | I also know of some full-page translation demo code that uses
           | the python bindings of bergamot-translator [2]. That's
           | basically a web proxy a la Goole Translate.
           | 
           | Lastly, marian, the translation software that's being used,
           | has a web server as well [3]. It does not support HTML
           | though.
           | 
           | EDIT: see also my earlier comment for using it with Node or
           | Python [4], which you could use to implement a simple web
           | API.
           | 
           | [1] https://github.com/mozilla/translation-service
           | 
           | [2] https://github.com/jerinphilip/tagtransfer
           | 
           | [3] https://marian-nmt.github.io/docs/#web-server
           | 
           | [4] https://news.ycombinator.com/item?id=31599231
        
         | dnc wrote:
         | Hi,
         | 
         | This is an awesome project, congratulations!
         | 
         | Could you share details about the machine translation engine
         | that is used (or where to find out more about it)? Are there
         | any plans to open source the extension code (with the
         | WebAssembly optmizations that are mentioned in the article)?
         | 
         | Thanks.
        
           | jphilip wrote:
           | A fork of marian-dev[1] is the underlying machine-translation
           | engine:
           | 
           | - https://github.com/browsermt/marian-dev
           | 
           | Development of higher-level code wrapping around marian-dev
           | make suitable for the browser-extension happens at:
           | 
           | - https://github.com/browsermt/bergamot-translator
           | 
           | Some of the WebAssembly optimizations are available in
           | bergamot-translator/marian-dev. Rest are in Firefox source-
           | code. A start point could be
           | https://bugzilla.mozilla.org/show_bug.cgi?id=1720747.
           | 
           | Extension code is open-source, and linked already in other
           | comments: - https://github.com/mozilla/firefox-translations
           | 
           | [1] https://github.com/marian-nmt/marian-dev
        
           | baobob wrote:
           | At least the code parts seem to be on GitHub:
           | https://github.com/browsermt
        
         | jarrell_mark wrote:
         | really great stuff! any plans for this on firefox mobile?
        
         | ainar-g wrote:
         | Thanks for the extension!
         | 
         | Are you planning on adding a "select some text - right click -
         | Translate in a tooltip" feature? It'd be extremely useful for
         | language learners.
        
           | HellsMaddy wrote:
           | +1. This was the first thing I tried to do and was surprised
           | this feature doesn't exist. Most often, I don't encounter
           | entire webpages in foreign languages, but rather small
           | snippets of text.
           | 
           | It seems there is an open issue for this:
           | https://github.com/mozilla/firefox-translations/issues/358
        
         | maxloh wrote:
         | What is the dataset used for training the model? Where did the
         | data come from?
        
           | jelmervdl wrote:
           | All of them are freely available. Most of them through mtdata
           | [1]. The exact list of the datasets is in the firefox-
           | translations-training pipeline configuration file [2].
           | 
           | [1] https://pypi.org/project/mtdata/
           | 
           | [2] https://github.com/mozilla/firefox-translations-
           | training/blo...
        
         | coder543 wrote:
         | Is this open source? I don't see a github link anywhere, and
         | I'm not sure if the models are freely usable.
         | 
         | EDIT: maybe this is it: https://github.com/mozilla/firefox-
         | translations-models
         | 
         | also some info here: https://github.com/mozilla/firefox-
         | translations-training
        
           | jelmervdl wrote:
           | Extension Github page: https://github.com/mozilla/firefox-
           | translations
        
         | ashkhn wrote:
         | Hi! This is an amazing project and will be really useful! Thank
         | you! I understand that the project is funded by EU so the focus
         | is on European languages but are there any plans to add CJK or
         | other languages ?
        
         | cf wrote:
         | What can we do as users or contributers to help improve the
         | accuracy of this extension? It's already amazing and would love
         | to see it get even better.
        
       | schroeding wrote:
       | It passes the "Turkey" <=> "turkey" test: "In _Turkey_ they
       | sometimes eat _turkey_. " => "In der _Turkei_ essen sie manchmal
       | _Truthahn_. " :D
       | 
       | Super cool! Real-time translation, in the browser, running
       | locally! And sure, not state of the art / on the level of deepl,
       | but on the level of Google Translate, 2015ish, maybe? Amazing!
        
         | mahmutc wrote:
         | You should find another test case :)
         | https://www.aljazeera.com/news/2022/6/2/un-registers-turkiye...
        
           | riedel wrote:
           | They actually put an umlaut into the official name to really
           | make sure it won't be used correctly internationally?
        
             | mahmutc wrote:
             | I was thinking about ISO-3166 part, but it seems standard
             | contains already some names with special character. i.e,
             | Reunion. https://en.m.wikipedia.org/wiki/List_of_ISO_3166_c
             | ountry_cod...
        
           | BiteCode_dev wrote:
           | Just reformulate:
           | 
           | "Turkiye quit being called Turkey cold turkey"
        
         | Erlangen wrote:
         | However, "Turkey is not a common food in Turkey." != "Die
         | Turkei ist kein gemeinsames Essen in der Turkei."
        
           | mathstuf wrote:
           | Well, that's technically ambiguous in English too. I don't
           | think many people are eating their own country ;) .
        
             | refulgentis wrote:
             | Hmm, is it ambiguous then? Seems there's only one
             | interpretation
        
           | Lukas_Skywalker wrote:
           | Also, ,,common" should be translated to ,,ublich" instead of
           | ,,gemeinsam". ,,Gemeinsam" is more like ,,collective" as in
           | ,,a collective effort".
        
             | tralarpa wrote:
             | As usual, deepl doesn't disappoint.
        
       | [deleted]
        
       | jordemort wrote:
       | I love it. I wish it could translate Chinese to English.
        
       | collsni wrote:
       | Wow awesome!
        
       | simlevesque wrote:
       | I wonder why French is absent.
       | 
       | Meanwhile they have Persian which is not even in the EU.
        
         | cassepipe wrote:
         | Should the EU languages get preferential treatment ?
        
           | Mizza wrote:
           | "This project has received funding from the European Union's
           | Horizon 2020 research and innovation programme under grant
           | agreement No 825303 ."
        
         | geraltofrivia wrote:
         | I wouldn't comment on the absence of French vis-a-vis other
         | languages. It's just slightly surprising because English <->
         | French is honestly a very widely studied translation sub-task,
         | with an enormous amount of parallel corpora available for
         | training these models.
        
           | simlevesque wrote:
           | Well, there's this and the fact that it is the second most
           | popular language in the european union, which sponsor the
           | project.
        
       | coffeeblack wrote:
        
         | tclancy wrote:
         | How do you think translations work, exactly?
        
       | mikevm wrote:
       | That's funny. I've just tried to translate "fuck you" to Russian
       | and I got "trakhat' tebia" while Google Translate gives the more
       | accurate "poshel na khui".
        
         | ainar-g wrote:
         | In my experience, DeepL is still the undefeated leader when it
         | comes to translating Russian obscenity, heh.
        
         | spitfire wrote:
         | Try "Russian warship, go fuck yourself!" instead. It should
         | work better.
        
         | numpad0 wrote:
         | Machine translations are accurate as a trebuchet past 300
         | yards, just a better than nothings. But they're great tool so
         | long user is aware.
        
       | guerrilla wrote:
       | What I need from this is to be able to select text and just have
       | it translated in a tooltip (or whatever.) This is what I'm using
       | the Simple Translate Firefox add-on for but unfortunately it
       | sends data to Google.
        
         | filoleg wrote:
         | It would be nice to have something like that for desktop, but
         | on mobile, iOS handles it amazingly.
         | 
         | You can select text almost anywhere (from browser to even from
         | a screenshot/image; literally anywhere you are able to select
         | text), and in a tooltip above the word, one of the few options
         | is translate. I love the UX of it, as it is super intuitive and
         | unobtrusive, and works pretty much instantaneously It runs
         | fully locally, no connection required. Slides a native OS pane
         | over the page to show possible translations along with
         | pronunciations and other extra info.
         | 
         | Sidenote: other features in that tooltip are pretty nifty too.
         | Aside from the obvious copy/cut/paste/share, i found "look up"
         | to be quite useful when i see a word I've not encountered
         | before. It pulls another native OS pane that shows dictionary
         | definitions and extra info like the wikipedia link. And the
         | actual dictionary definitions are local too afaik.
        
           | jeroenhd wrote:
           | Android had the same feature, assuming apps don't disable the
           | tooltip. Selecting text on my phone brings a nice context
           | menu for cut/copy/paste/search/translate/encrypt (that last
           | one was added by OpenKeychain, a PGP app).
           | 
           | It doesn't come with a dictionary built in, but the search
           | button becomes an online dictionary in a pinch. Any
           | dictionary app could extend the menu to add a local
           | dictionary, of course.
        
           | rahimnathwani wrote:
           | Android has this too. BUT:
           | 
           | When my phone is on portrait mode (almost always), I don't
           | see the translate option until I tap on the three dots.
           | 
           | The translation isn't instant. It takes a second to show up,
           | and then takes up the top of the screen.
           | 
           | I'd much prefer a UI similar to the Zhongwen Chrome
           | extension.
        
             | guerrilla wrote:
             | > It takes a second to show up, and then takes up the top
             | of the screen.
             | 
             | I think this could have to do with it not being local...
        
           | guerrilla wrote:
           | Yeah, Android has the same.
        
       | baobob wrote:
       | Awesome, tested the German model on dw.com, surprisingly fast and
       | accurate.
        
       ___________________________________________________________________
       (page generated 2022-06-02 23:00 UTC)