[HN Gopher] Using aligned word vectors for instant translations ...
       ___________________________________________________________________
        
       Using aligned word vectors for instant translations with Python and
       Rust
        
       Author : beau
       Score  : 54 points
       Date   : 2021-06-10 20:22 UTC (2 hours ago)
        
 (HTM) web link (instantdomainsearch.com)
 (TXT) w3m dump (instantdomainsearch.com)
        
       | PaulHoule wrote:
       | Nice example.
       | 
       | The short text and that fact that your application would tolerate
       | or celebrate catchy neologisms plays to fasttext's strengths.
        
         | beau wrote:
         | Thank you!
        
       | beau wrote:
       | We've released the underlying Rust implementation here:
       | https://github.com/InstantDomain/instant-distance with Python
       | bindings at https://pypi.org/project/instant-distance -- feedback
       | welcome!
        
         | Fiahil wrote:
         | I've not much to say on the actual lib, it seems great!
         | However, don't feel compelled to put all your rust code into a
         | single lib.rs. You can split your work into several files and
         | use 'pub use' and 'mod' in lib.rs to re-export your functions &
         | types into a public API of your choosing.
         | 
         | cargo check and format time might also slightly improve!
        
       | [deleted]
        
       | denysvitali wrote:
       | > For example, here are the results of translating the English
       | word "hello":
       | 
       | > Language: fr, Translation: bonjours
       | 
       | > Language: fr, Translation: bonsoir
       | 
       | > Language: fr, Translation: salutations
       | 
       | > Language: it, Translation: buongiorno
       | 
       | > Language: it, Translation: buonanotte
       | 
       | > Language: fr, Translation: rebonjour
       | 
       | > Language: it, Translation: auguri
       | 
       | > Language: fr, Translation: bonjour,
       | 
       | > Language: it, Translation: buonasera
       | 
       | > Language: it, Translation: chiamatemi
       | 
       | Is it just me or these machine translations are worse than ...
       | Google Translate?
        
         | beau wrote:
         | These results are less accurate than Google Translate. But they
         | are far faster to get, and far less expensive to generate:
         | https://cloud.google.com/translate/pricing -- our goal is here
         | is speed. We want to search through many possibilities as
         | quickly as possible.
         | 
         | The word vectors have been aligned in multiple languages. Using
         | an approximate nearest neighbor search we are able to find the
         | nearest vector to the input in multiple languages very quickly.
         | 
         | To keep the example simple, we did not try to filter the data
         | through hand-built language dictionaries. In fact, we simply
         | drop words in other languages that also appear in the English
         | .vec file. Words like "ciao" appear frequently enough in
         | otherwise English sentences that the example code drops it from
         | Italian, and so is not shown in the results:
         | 
         | % curl -s "https://dl.fbaipublicfiles.com/fasttext/vectors-
         | aligned/wiki..." | grep -n ciao 50393:ciao 0.0120 ...
         | 
         | One improvement would be to filter out any words that do not
         | appear in a hand-curated dictionary instead of filtering out
         | words that already appear in English. We decided not to show
         | how to do this because we'd already introduced a few concepts,
         | like aligned word vectors, approximate nearest neighbour
         | searches, and wanted to keep the example as simple as possible.
        
         | toxik wrote:
         | Google Translate is state of the art, so I'm not sure why that
         | would be surprising. That said, is there something wrong with
         | the translations offered?
        
           | dataflow wrote:
           | > That said, is there something wrong with the translations
           | offered?
           | 
           | I think in French hello = "bonjour" and hi = "salut"... not
           | sure where "bonjours" and "salutations" came from.
        
             | T-A wrote:
             | The Italian "auguri" means "best wishes"; "chiamatemi"
             | means "call me". Neither is a plausible translation of
             | "hello". The obvious one, "ciao", is missing.
        
         | ampdepolymerase wrote:
         | It would be better to run the vectors through an attention
         | layer if you want sentence to sentence translation.
        
       | aitk wrote:
       | At first glance at the title, I thought it was translating Python
       | code to Rust code.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-06-10 23:00 UTC)