[HN Gopher] Mozilla releases local machine translation tools as ... ___________________________________________________________________ Mozilla releases local machine translation tools as part of Project Bergamot Author : Vinnl Score : 355 points Date : 2022-06-02 16:23 UTC (6 hours ago) (HTM) web link (blog.mozilla.org) (TXT) w3m dump (blog.mozilla.org) | boberoni wrote: | For i18n in my own projects, I typically use tools like gettext | and involves lots of volunteers to do the translations. I might | try out these neural machine translation tools to see how they | fare. I also wonder if these machine translation tools are | trained on a corpus of gettext datasets. | bogwog wrote: | This is awesome, but... | | > This set of requirements posed a number of technological | challenges to the team: the translation engine was entirely | written in programming languages that compile to native code. We | needed a way to streamline the distribution of the project in | order to avoid the overhead involved in providing builds | compatible with all platforms supported by Firefox -- that would | be impracticable to scale and maintain. | | Does Firefox really support so many different platforms and archs | that CI builds are unrealistic? | jelmervdl wrote: | The upside of using WASM is that the extension itself can be | easily ported to other browsers and platforms. The UI uses | Firefox specific APIs but the parts that take the HTML from a | page and push it through the translation engine would also work | in any Chrome-based browser. | | (Edit: also free sandboxing of a blob of C++ code that needs to | handle arbitrary input from the web!) | dblohm7 wrote: | (Former Mozilla employee, here) | | I'm completely speculating, but it's probably a matter of not | wanting to complicate iterating on the translation engine by | introducing a bunch of cruft from the Firefox build system | (which, though it uses GNU make under the hood, is very much | bespoke and complicated). | | Since the translation engine is intended to run on a product | that hosts WASM, they might as well just build to that. | whinvik wrote: | Can we use this on mobile? | jeroenhd wrote: | The demo site works on mobile if you let it load the necessary | content so if you're speaking from a web dev point of view: | definitely. | | As for the addon, on Android you'll need to install an unstable | version of Firefox and configures a custom addon list in an | addons.mozilla.org account that includes it so you can download | it. | | On iOS there isn't any option to download addons as far as I'm | aware. On mobile Linux environments everything should work like | on desktop. | djvdq wrote: | You can't download any addon for Firefox on iOS because it's | almost Safari, only looking a bit different. All browsers on | iOS has to use WebKit so FF is not really FF here on iOS. | jelmervdl wrote: | I think the Firefox extension might not work on mobile | because it hooks into some undocumented addon apis to draw | that translation bar UI. Those might not be available on | mobile. | | The translation code itself should work on mobile. It's just | some javascript & wasm (albeit with SIMD instructions not | implemented in Safari's WASM vm...) | Vinnl wrote: | I just installed the extension on Fenix Nightly and indeed, | it does not work. | option wrote: | What's wrong with using cloud without sending any user id with | the request? | vanilla_nut wrote: | If local translation can help me use a website without a query | to some cloud server... who needs the cloud? No backend that | will experience downtime, and someday be decommissioned. No | money sink of cloud processing pressuring the product to | advertise or monetise in unscrupulous ways. | | I'm sure cloud processing is better in many ways. But if this | is "good enough" I'd rather just do it all locally. | [deleted] | chungy wrote: | There's likely far more identifiable information in the actual | text than a user ID provides. | no_time wrote: | In the current status quo you either make use of an api by | indentifying yourself with a key or a browser session that is | fingerprintable in a gazillion ways. There is no such thing as | "not sending user ID" or if there is, it has a totally | negligible reach. | drewzero1 wrote: | Cloud assumes a constant, reliable internet connection, which | is not the reality in most of the world. (Nor is it always | desirable.) | jffry wrote: | If the data never leaves your device, then a third-party | service never gets the opportunity to leak or misuse it. This | is far more private. | | How many stories have you heard about breaches due to | accidentally mis-configured logging in web services? Also in | the news lately was Twitter misusing 2fa phone numbers for | advertising purposes. | toper-centage wrote: | What if what I'm trying to translate is sensitive information | in itself? | kevin_thibedeau wrote: | "The telescreen received and transmitted simultaneously. Any | sound that Winston made, above the level of a very low whisper, | would be picked up by it; moreover, so long as he remained | within the field of vision which the metal plaque commanded, he | could be seen as well as heard. There was of course no way of | knowing whether you were being watched at any given moment." | 0des wrote: | "How often, or on what system, the Thought Police plugged in | on any individual wire was guesswork. It was even conceivable | that they watched everybody all the time." | no_time wrote: | This is incredible and super important. For all the blunders of | Mozilla in the last decade, they still have some great projects. | I am also grateful of them not scrapping common voice. | _trampeltier wrote: | Also important, because now it seems at least in germany, on | Google translate, there is the translate website button | missing. From Switzerland I saw the button lately when I tryed. | I don't know if it is because go to cencored (russian) sites. | My company blocks google translate anyway, probably because of | the same reason. | no_time wrote: | Try copying the url into the translator's text field. It's | how I've been using it for years. | croes wrote: | Are you sure? | | Under Google Ubersetzer I see three button: Text, Dokumente, | Websites | [deleted] | coding123 wrote: | In the long run, I am a super huge fan of Mozilla and Firefox. | I am using it right now. After a 10 year stint of using Chrome | exclusively I now use Firefox as my main driver. Unfortunately | I still need to keep chrome around for weird situations where | the website developer only tested in Chrome (Yes this still | exists. A shopping cart in a popular website - cough Home Depot | cough cough - that recently failed me in Firefox worked in | Chrome. I haven't tried in a couple weeks hopefully that is | fixed.) | Shadonototra wrote: | it's not their project, all they did was to write a form in JS | | the whole project is a EU funded one, all done in the | university of Edinburgh | | https://cordis.europa.eu/project/id/825303 | | you giving full credit to Mozilla is dishonest, to say the | least | | it aligns to their past projects, including using Mullvad and | slapping a Mozilla sticker on top of it to claim it as their | own | | also it is super funny to read this: | | > H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in | enabling and industrial technologies - Information and | Communication Technologies (ICT) MAIN PROGRAMME | | little do they know, EU never learn | stuartd wrote: | All they did was 'write a form in JS'??? | | > Our solution to that was to develop a high-level API around | the machine translation engine, port it to WebAssembly, and | optimize the operations for matrix multiplication to run | efficiently on CPUs. | nyanpasu64 wrote: | It's unfortunate that it doesn't translate Japanese, and reading | Japanese-only resources is a common hurdle in the retro game | modding/development community. | simonmales wrote: | Great that you contribute your own language pairs. | obert wrote: | The sooner we move AI to 127.0.0.1 the better, enough with The | Cloud powerhouses. | | Yes there's work to be done, resilience, power efficiency, | responsiveness, but it's the right direction for everything that | involves private computing. | Vinnl wrote: | I've been using the extension [1] for a bit and, while it doesn't | support too many languages, for the ones it does it's pretty cool | to have it all running locally. | | [1] https://addons.mozilla.org/firefox/addon/firefox- | translation... | baobob wrote: | Do you know what the pipeline looks like for new language pairs | being added? This is really, really, really awesome | | I'm also immediately curious about using it headless outside | the browser | jelmervdl wrote: | The training pipeline is also on Github! [1] | | I was experimenting with running the wasm version of | bergamot-translator (the translation engine used by the | addon) in node [2]. | | However, if you want more performance, using the Python | library [3] or the native C++ interface [4] gets you further | because the wasm build is limited to a single thread and thus | a blocking interface, and can't use all the processor | specific optimisations that are in the native builds. | | EDIT: Another option is using translateLocally [5], which is | a Qt desktop app based on bergamot-translator. It has a | native messaging API that is designed as a much faster | alternative to the wasm build for browser extensions, but it | can also be used from Python [6]. | | [1] https://github.com/mozilla/firefox-translations-training | | [2] https://gist.github.com/jelmervdl/a4c8b6b92ad88a885e1cbd5 | 1c6... | | [3] https://colab.research.google.com/drive/1AHpgewVJBFaupwAb | Zq0... | | [4] https://github.com/browsermt/bergamot- | translator/blob/main/a... | | [5] https://github.com/XapaJIaMnu/translateLocally | | [6] https://github.com/XapaJIaMnu/translateLocally/blob/maste | r/s... | clairity wrote: | neat, but it looks like it was just released, so how were you | using it before? | | as an aside, pretty sad to see the project page, | https://browser.mt/ , requiring not just javascript but | specifically google connections to work. to 5 different google | properties, no less. | Vinnl wrote: | I work at Mozilla, so got a sneak preview (and also the first | bugs) :) | | (Of course technically the work was out there in the open | already, since it's Mozilla.) | | Agreed about the Bergamot website. I suspect it's not by | Mozilla, but I'll see if I can ask someone to take a look, as | I don't think all those connections should be necessary. | edko wrote: | Do you know if this will be open-sourced, or if the repo is | already available? | space_fountain wrote: | I think this is probably the source | https://github.com/mozilla/firefox-translations | | edit: and for the actual translations | https://github.com/mozilla/bergamot-translator | clairity wrote: | awesome, thanks! i also suspect it's by the EU coalition | behind bergamot, so probably beyond mozilla's jurisdiction, | but it doesn't hurt to ask. | 0des wrote: | .mt huh thats a new one for me | lovelearning wrote: | Stuck at "Loading translation engine..." from a long time. Tried | German and Spanish. Can't tell if it's downloading some model | data or something's failed. I suggest some kind of progress | indicator. | ainar-g wrote: | Weird. I had a numeric progress indicator, and the model got | downloaded in just a couple of seconds. | Vinnl wrote: | You might want to report that here, if it's not reported | already: https://github.com/mozilla/firefox-translations/issues | jeroenhd wrote: | Looks lovely! Offline translations are very welcome in a world | where the most important translation engines are also run by the | world's biggest data hoarders. | | Sadly, the extension either doesn't work on mobile or Mozilla | couldn't be bothered to add it to the whitelist. | andrenatal1 wrote: | Hi, I am part of the team who developed this and the author of | the article. You can ask me anything about it if you have | questions. | msdrigg wrote: | Chinese language is what I most commonly want to translate. Is | there any planned support for this? | unicornporn wrote: | Google Translate code is present on many web sites to provide | automatic translations of text. Could your translate code be | uploaded to a server and embedded in web page to provide the | same functionality? | jelmervdl wrote: | I'm not aware of any actively maintained projects that give | you this out of the box, but these two could be starting | points for such a project. | | Mozilla implemented a REST service based on (an earlier | version of) bergamot-translator [1]. You could use that as a | replacement for the WASM component in the addon's code. | | I also know of some full-page translation demo code that uses | the python bindings of bergamot-translator [2]. That's | basically a web proxy a la Goole Translate. | | Lastly, marian, the translation software that's being used, | has a web server as well [3]. It does not support HTML | though. | | EDIT: see also my earlier comment for using it with Node or | Python [4], which you could use to implement a simple web | API. | | [1] https://github.com/mozilla/translation-service | | [2] https://github.com/jerinphilip/tagtransfer | | [3] https://marian-nmt.github.io/docs/#web-server | | [4] https://news.ycombinator.com/item?id=31599231 | dnc wrote: | Hi, | | This is an awesome project, congratulations! | | Could you share details about the machine translation engine | that is used (or where to find out more about it)? Are there | any plans to open source the extension code (with the | WebAssembly optmizations that are mentioned in the article)? | | Thanks. | jphilip wrote: | A fork of marian-dev[1] is the underlying machine-translation | engine: | | - https://github.com/browsermt/marian-dev | | Development of higher-level code wrapping around marian-dev | make suitable for the browser-extension happens at: | | - https://github.com/browsermt/bergamot-translator | | Some of the WebAssembly optimizations are available in | bergamot-translator/marian-dev. Rest are in Firefox source- | code. A start point could be | https://bugzilla.mozilla.org/show_bug.cgi?id=1720747. | | Extension code is open-source, and linked already in other | comments: - https://github.com/mozilla/firefox-translations | | [1] https://github.com/marian-nmt/marian-dev | baobob wrote: | At least the code parts seem to be on GitHub: | https://github.com/browsermt | jarrell_mark wrote: | really great stuff! any plans for this on firefox mobile? | ainar-g wrote: | Thanks for the extension! | | Are you planning on adding a "select some text - right click - | Translate in a tooltip" feature? It'd be extremely useful for | language learners. | HellsMaddy wrote: | +1. This was the first thing I tried to do and was surprised | this feature doesn't exist. Most often, I don't encounter | entire webpages in foreign languages, but rather small | snippets of text. | | It seems there is an open issue for this: | https://github.com/mozilla/firefox-translations/issues/358 | maxloh wrote: | What is the dataset used for training the model? Where did the | data come from? | jelmervdl wrote: | All of them are freely available. Most of them through mtdata | [1]. The exact list of the datasets is in the firefox- | translations-training pipeline configuration file [2]. | | [1] https://pypi.org/project/mtdata/ | | [2] https://github.com/mozilla/firefox-translations- | training/blo... | coder543 wrote: | Is this open source? I don't see a github link anywhere, and | I'm not sure if the models are freely usable. | | EDIT: maybe this is it: https://github.com/mozilla/firefox- | translations-models | | also some info here: https://github.com/mozilla/firefox- | translations-training | jelmervdl wrote: | Extension Github page: https://github.com/mozilla/firefox- | translations | ashkhn wrote: | Hi! This is an amazing project and will be really useful! Thank | you! I understand that the project is funded by EU so the focus | is on European languages but are there any plans to add CJK or | other languages ? | cf wrote: | What can we do as users or contributers to help improve the | accuracy of this extension? It's already amazing and would love | to see it get even better. | schroeding wrote: | It passes the "Turkey" <=> "turkey" test: "In _Turkey_ they | sometimes eat _turkey_. " => "In der _Turkei_ essen sie manchmal | _Truthahn_. " :D | | Super cool! Real-time translation, in the browser, running | locally! And sure, not state of the art / on the level of deepl, | but on the level of Google Translate, 2015ish, maybe? Amazing! | mahmutc wrote: | You should find another test case :) | https://www.aljazeera.com/news/2022/6/2/un-registers-turkiye... | riedel wrote: | They actually put an umlaut into the official name to really | make sure it won't be used correctly internationally? | mahmutc wrote: | I was thinking about ISO-3166 part, but it seems standard | contains already some names with special character. i.e, | Reunion. https://en.m.wikipedia.org/wiki/List_of_ISO_3166_c | ountry_cod... | BiteCode_dev wrote: | Just reformulate: | | "Turkiye quit being called Turkey cold turkey" | Erlangen wrote: | However, "Turkey is not a common food in Turkey." != "Die | Turkei ist kein gemeinsames Essen in der Turkei." | mathstuf wrote: | Well, that's technically ambiguous in English too. I don't | think many people are eating their own country ;) . | refulgentis wrote: | Hmm, is it ambiguous then? Seems there's only one | interpretation | Lukas_Skywalker wrote: | Also, ,,common" should be translated to ,,ublich" instead of | ,,gemeinsam". ,,Gemeinsam" is more like ,,collective" as in | ,,a collective effort". | tralarpa wrote: | As usual, deepl doesn't disappoint. | [deleted] | jordemort wrote: | I love it. I wish it could translate Chinese to English. | collsni wrote: | Wow awesome! | simlevesque wrote: | I wonder why French is absent. | | Meanwhile they have Persian which is not even in the EU. | cassepipe wrote: | Should the EU languages get preferential treatment ? | Mizza wrote: | "This project has received funding from the European Union's | Horizon 2020 research and innovation programme under grant | agreement No 825303 ." | geraltofrivia wrote: | I wouldn't comment on the absence of French vis-a-vis other | languages. It's just slightly surprising because English <-> | French is honestly a very widely studied translation sub-task, | with an enormous amount of parallel corpora available for | training these models. | simlevesque wrote: | Well, there's this and the fact that it is the second most | popular language in the european union, which sponsor the | project. | coffeeblack wrote: | tclancy wrote: | How do you think translations work, exactly? | mikevm wrote: | That's funny. I've just tried to translate "fuck you" to Russian | and I got "trakhat' tebia" while Google Translate gives the more | accurate "poshel na khui". | ainar-g wrote: | In my experience, DeepL is still the undefeated leader when it | comes to translating Russian obscenity, heh. | spitfire wrote: | Try "Russian warship, go fuck yourself!" instead. It should | work better. | numpad0 wrote: | Machine translations are accurate as a trebuchet past 300 | yards, just a better than nothings. But they're great tool so | long user is aware. | guerrilla wrote: | What I need from this is to be able to select text and just have | it translated in a tooltip (or whatever.) This is what I'm using | the Simple Translate Firefox add-on for but unfortunately it | sends data to Google. | filoleg wrote: | It would be nice to have something like that for desktop, but | on mobile, iOS handles it amazingly. | | You can select text almost anywhere (from browser to even from | a screenshot/image; literally anywhere you are able to select | text), and in a tooltip above the word, one of the few options | is translate. I love the UX of it, as it is super intuitive and | unobtrusive, and works pretty much instantaneously It runs | fully locally, no connection required. Slides a native OS pane | over the page to show possible translations along with | pronunciations and other extra info. | | Sidenote: other features in that tooltip are pretty nifty too. | Aside from the obvious copy/cut/paste/share, i found "look up" | to be quite useful when i see a word I've not encountered | before. It pulls another native OS pane that shows dictionary | definitions and extra info like the wikipedia link. And the | actual dictionary definitions are local too afaik. | jeroenhd wrote: | Android had the same feature, assuming apps don't disable the | tooltip. Selecting text on my phone brings a nice context | menu for cut/copy/paste/search/translate/encrypt (that last | one was added by OpenKeychain, a PGP app). | | It doesn't come with a dictionary built in, but the search | button becomes an online dictionary in a pinch. Any | dictionary app could extend the menu to add a local | dictionary, of course. | rahimnathwani wrote: | Android has this too. BUT: | | When my phone is on portrait mode (almost always), I don't | see the translate option until I tap on the three dots. | | The translation isn't instant. It takes a second to show up, | and then takes up the top of the screen. | | I'd much prefer a UI similar to the Zhongwen Chrome | extension. | guerrilla wrote: | > It takes a second to show up, and then takes up the top | of the screen. | | I think this could have to do with it not being local... | guerrilla wrote: | Yeah, Android has the same. | baobob wrote: | Awesome, tested the German model on dw.com, surprisingly fast and | accurate. ___________________________________________________________________ (page generated 2022-06-02 23:00 UTC)