hngopher.com

       [HN Gopher] Language-Agnostic Bert Sentence Embedding
       ___________________________________________________________________
        
       Language-Agnostic Bert Sentence Embedding
        
       Author : theafh
       Score  : 47 points
       Date   : 2020-08-18 19:50 UTC (3 hours ago)
        
 (HTM) web link (ai.googleblog.com)
 (TXT) w3m dump (ai.googleblog.com)
        
       | lacker wrote:
       | After all the attention OpenAI got for making the GPT-3 API semi-
       | publicly-available, I wonder if Google has considered making some
       | of their research models available via API. It would be pretty
       | neat to be able to try these things out, rather than just reading
       | papers about them.
        
         | MiroF wrote:
         | You've actually got it backwards.
         | 
         | Releasing model weights has been common for a long time, it was
         | OpenAI that regressed and refused to release its GPT models
         | until it released GPT-3 hidden behind a semi-public API. The
         | google BERT models are released pre-trained in full (as
         | mentioned in the article), you can easily play around with them
         | on your own.
         | 
         | BERT isn't a forward LM like GPT, so it's a little less easy to
         | play with for the uninitiated - although there are papers
         | showing text generation with BERT.
        
           | tmabraham wrote:
           | To be fair, IIRC OpenAI did release GPT-2-large after some
           | community pressure, and it was somewhat doable for some
           | people to actually train from scratch. GPT-3 is too large, so
           | even if they released it, nobody apart from large companies
           | like Google could do anything with it. If anything, they've
           | made GPT-3 more open than it would have been if they just
           | released the weights.
           | 
           | At least that's my understanding. Feel free to correct me if
           | I'm wrong.
        
             | Der_Einzige wrote:
             | They only released GPT-2 large after other language models
             | came out that were even larger, like T5...
        
             | liuliu wrote:
             | I don't know. It has 175 billion parameters, thus, about
             | 500GiB for floating point parameters. If these are
             | bfloat16, it is mere 300GiB. We know CPU is about 50 times
             | slower than GPU for transformer models, hence, we are
             | looking at 4 to 8 minutes per inference (parameter loading
             | on demand from SSD takes 200 seconds or so, and probably
             | the bottleneck here). If the parameters can be loaded into
             | memory (seems you have to be on a >= 8-channel machine with
             | unbuffered RAM or with buffered RAM), it will be probably 1
             | to 2 minutes per inference.
             | 
             | Still in the realm of doing inference in homelab territory
             | (barely).
        
             | MiroF wrote:
             | I'm not really up in arms about OpenAIs decision, but I
             | just don't want people to frame this as "why can't more
             | companies release models like OpenAI does" when in reality
             | the opposite is pretty much true.
             | 
             | GPT-2 wasn't really feasible for most actors to train from
             | scratch, otherwise it would have been released by a third
             | party. GPT-3 is technically feasible for a single actor to
             | do inference with, although not really really.
        
               | tmabraham wrote:
               | I agree with your comment here but I think for the type
               | and size of model, OpenAI has done the best they could
               | with such large models. So I wouldn't say they've
               | "regressed and refused to release its GPT models" they
               | just had to take a different route with GPT models.
        
           | lettergram wrote:
           | I firmly believe what OpenAI is actually doing is building a
           | dataset.
        
           | chaoz_ wrote:
           | However, @lacker makes a valid point that most research
           | papers lack interactive, well-designed API for people
           | unfamiliar with the domain.
        
           | colordrops wrote:
           | It's really ironic that they are regressing and keeping their
           | research behind a curtain when the entire pretense for their
           | existence (and their name) is to open things up.
        
             | taneq wrote:
             | It's not like it's a new across-the-board policy for them.
             | They were just being cautious in this particular case
             | because they were worried about social ramifications.
             | Arguably some other research should have been made less
             | readily available like this (think deepfakes for example).
        
             | est31 wrote:
             | It's easier to hire people for a (claimed) mission driven
             | company than for business evil corp that builds killer
             | machines for the us military.
        
       ___________________________________________________________________
       (page generated 2020-08-18 23:00 UTC)