[HN Gopher] Using aligned word vectors for instant translations ... ___________________________________________________________________ Using aligned word vectors for instant translations with Python and Rust Author : beau Score : 54 points Date : 2021-06-10 20:22 UTC (2 hours ago) (HTM) web link (instantdomainsearch.com) (TXT) w3m dump (instantdomainsearch.com) | PaulHoule wrote: | Nice example. | | The short text and that fact that your application would tolerate | or celebrate catchy neologisms plays to fasttext's strengths. | beau wrote: | Thank you! | beau wrote: | We've released the underlying Rust implementation here: | https://github.com/InstantDomain/instant-distance with Python | bindings at https://pypi.org/project/instant-distance -- feedback | welcome! | Fiahil wrote: | I've not much to say on the actual lib, it seems great! | However, don't feel compelled to put all your rust code into a | single lib.rs. You can split your work into several files and | use 'pub use' and 'mod' in lib.rs to re-export your functions & | types into a public API of your choosing. | | cargo check and format time might also slightly improve! | [deleted] | denysvitali wrote: | > For example, here are the results of translating the English | word "hello": | | > Language: fr, Translation: bonjours | | > Language: fr, Translation: bonsoir | | > Language: fr, Translation: salutations | | > Language: it, Translation: buongiorno | | > Language: it, Translation: buonanotte | | > Language: fr, Translation: rebonjour | | > Language: it, Translation: auguri | | > Language: fr, Translation: bonjour, | | > Language: it, Translation: buonasera | | > Language: it, Translation: chiamatemi | | Is it just me or these machine translations are worse than ... | Google Translate? | beau wrote: | These results are less accurate than Google Translate. But they | are far faster to get, and far less expensive to generate: | https://cloud.google.com/translate/pricing -- our goal is here | is speed. We want to search through many possibilities as | quickly as possible. | | The word vectors have been aligned in multiple languages. Using | an approximate nearest neighbor search we are able to find the | nearest vector to the input in multiple languages very quickly. | | To keep the example simple, we did not try to filter the data | through hand-built language dictionaries. In fact, we simply | drop words in other languages that also appear in the English | .vec file. Words like "ciao" appear frequently enough in | otherwise English sentences that the example code drops it from | Italian, and so is not shown in the results: | | % curl -s "https://dl.fbaipublicfiles.com/fasttext/vectors- | aligned/wiki..." | grep -n ciao 50393:ciao 0.0120 ... | | One improvement would be to filter out any words that do not | appear in a hand-curated dictionary instead of filtering out | words that already appear in English. We decided not to show | how to do this because we'd already introduced a few concepts, | like aligned word vectors, approximate nearest neighbour | searches, and wanted to keep the example as simple as possible. | toxik wrote: | Google Translate is state of the art, so I'm not sure why that | would be surprising. That said, is there something wrong with | the translations offered? | dataflow wrote: | > That said, is there something wrong with the translations | offered? | | I think in French hello = "bonjour" and hi = "salut"... not | sure where "bonjours" and "salutations" came from. | T-A wrote: | The Italian "auguri" means "best wishes"; "chiamatemi" | means "call me". Neither is a plausible translation of | "hello". The obvious one, "ciao", is missing. | ampdepolymerase wrote: | It would be better to run the vectors through an attention | layer if you want sentence to sentence translation. | aitk wrote: | At first glance at the title, I thought it was translating Python | code to Rust code. | [deleted] ___________________________________________________________________ (page generated 2021-06-10 23:00 UTC)