[HN Gopher] T5: The Text-to-Text Transfer Transformer
       ___________________________________________________________________
        
       T5: The Text-to-Text Transfer Transformer
        
       Author : theafh
       Score  : 74 points
       Date   : 2020-02-24 21:05 UTC (1 hours ago)
        
 (HTM) web link (ai.googleblog.com)
 (TXT) w3m dump (ai.googleblog.com)
        
       | kitsune_ wrote:
       | No mention of MASS by Microsoft? It was afaik one of the first
       | pretraining schemes for a full transformer outside of XLM.
       | 
       | Imho a bit unfortunate, as is calling the decoder or the encoder
       | of a transformer "a transformer", as it has happened with GPT and
       | BERT, which now forces people to use "full transformer" or using
       | phrases like the title of the blog post.
        
         | craffel wrote:
         | We include MASS in our empirical survey (see e.g. section 3.3.2
         | of our paper, https://arxiv.org/pdf/1910.10683.pdf). FWIW,
         | people were pre-training Transformers before MASS, e.g.
         | "Improving Language Understanding by Generative Pre-Training"
         | by Radford et al. from 2018. Even further back, "Semi-
         | Supervised Sequence Learning" by Dai et al. describe pre-
         | training an RNN encoder-decoder model for subsequent transfer.
        
           | kitsune_ wrote:
           | But Radford is just pretraining the decoder and qualitatively
           | different from a seq2seq approach such as MASS. If we just
           | look at the original paper from Vaswani, than "pretraining a
           | transformer" imho should always only have meant pretraing the
           | encoder and decoder. Obviously that ship has sailed.
        
       | j0e1 wrote:
       | Link to the paper: https://arxiv.org/abs/1910.10683
        
       | draw_down wrote:
       | What does "cola sentence" mean in the first example GIF?
       | (https://1.bp.blogspot.com/-o4oiOExxq1s/Xk26XPC3haI/AAAAAAAAF...)
        
       | modeless wrote:
       | The trivia game (https://t5-trivia.glitch.me/) needs a little
       | work.
       | 
       | > Q: How did Gus Grissom, Ed White and Roger B. Chaffee die in
       | 1967?
       | 
       | > You: "Apollo 1" WRONG
       | 
       | > T5: "They were killed when their Apollo 1 spacecraft exploded"
       | WRONG
       | 
       | > Correct answer: burned to death
       | 
       | > Q: Which Alpine peak is known in Italy as Monte Cervino?
       | 
       | > You: "Monte Cervino" CORRECT
       | 
       | I wonder how many of the problems with this game could be fixed
       | by applying T5 itself to the answer grading.
        
         | craffel wrote:
         | Yes, unfortunately we have to rely on the very brittle "exact
         | match" method of evaluating whether an answer is correct. FWIW
         | and perhaps surprisingly, this is the primary way question-
         | answering systems are evaluated in common benchmarks. I totally
         | agree that fine-tuning T5 for answer grading would be super
         | interesting!
        
           | modeless wrote:
           | I think it makes some sense to evaluate models like this, as
           | you want to be conservative with the answers you accept
           | (though my second example shows that it isn't always
           | conservative), and models don't have feelings to hurt if they
           | are docked points for not being precise enough. Humans, of
           | course, are more sensitive.
        
       | foota wrote:
       | In case anyone from the team is watching, the colab link at the
       | bottom is broken.
        
         | luismmolina wrote:
         | it is working for me
        
         | craffel wrote:
         | Thanks, fixed!
        
       | lanekelly wrote:
       | Anyone know what's new in the blogpost? T5 has been out for a few
       | months now.
        
         | craffel wrote:
         | The blogpost has a summary of our paper from October (a bit
         | late, sorry!) but also has some (fun?) new results on closed-
         | book question answering and fill-in-the-blank text generation.
        
       | thatcherc wrote:
       | The 'fill in the N blanks' results at the end are fascinating!
       | N<64 are all pretty normal, but then for N=64 and N=512, it
       | starts going on about the old 1930s cookbook it has and its grad
       | school experiences! Wild. I think I would not be able to
       | distinguish this from a selection of real Amazon reviews or
       | similar informal text.
        
       | ComputerGuru wrote:
       | Isn't that the name of Microsoft's code generation templating
       | language?
        
         | zamalek wrote:
         | That's T4. The naming is pretty unfortunate.
        
       | atomoton wrote:
       | Wonder how this would compare with Watson at playing Jeopardy...
        
         | halflings wrote:
         | I would assume most QA (question answering) models blow Watson
         | out of the water. A lot has been done since then. See:
         | https://aclweb.org/aclwiki/Question_Answering_(State_of_the_...
        
           | benrbray wrote:
           | Sure, but Jeopardy is all about AQ (Answer-Questioning) :)
        
       | derefr wrote:
       | Now I'm curious what it'd give as output for the missing-tokens
       | task if you specialized it on understanding SVG vector image
       | data...
        
       ___________________________________________________________________
       (page generated 2020-02-24 23:00 UTC)