[HN Gopher] T5: The Text-to-Text Transfer Transformer ___________________________________________________________________ T5: The Text-to-Text Transfer Transformer Author : theafh Score : 74 points Date : 2020-02-24 21:05 UTC (1 hours ago) (HTM) web link (ai.googleblog.com) (TXT) w3m dump (ai.googleblog.com) | kitsune_ wrote: | No mention of MASS by Microsoft? It was afaik one of the first | pretraining schemes for a full transformer outside of XLM. | | Imho a bit unfortunate, as is calling the decoder or the encoder | of a transformer "a transformer", as it has happened with GPT and | BERT, which now forces people to use "full transformer" or using | phrases like the title of the blog post. | craffel wrote: | We include MASS in our empirical survey (see e.g. section 3.3.2 | of our paper, https://arxiv.org/pdf/1910.10683.pdf). FWIW, | people were pre-training Transformers before MASS, e.g. | "Improving Language Understanding by Generative Pre-Training" | by Radford et al. from 2018. Even further back, "Semi- | Supervised Sequence Learning" by Dai et al. describe pre- | training an RNN encoder-decoder model for subsequent transfer. | kitsune_ wrote: | But Radford is just pretraining the decoder and qualitatively | different from a seq2seq approach such as MASS. If we just | look at the original paper from Vaswani, than "pretraining a | transformer" imho should always only have meant pretraing the | encoder and decoder. Obviously that ship has sailed. | j0e1 wrote: | Link to the paper: https://arxiv.org/abs/1910.10683 | draw_down wrote: | What does "cola sentence" mean in the first example GIF? | (https://1.bp.blogspot.com/-o4oiOExxq1s/Xk26XPC3haI/AAAAAAAAF...) | modeless wrote: | The trivia game (https://t5-trivia.glitch.me/) needs a little | work. | | > Q: How did Gus Grissom, Ed White and Roger B. Chaffee die in | 1967? | | > You: "Apollo 1" WRONG | | > T5: "They were killed when their Apollo 1 spacecraft exploded" | WRONG | | > Correct answer: burned to death | | > Q: Which Alpine peak is known in Italy as Monte Cervino? | | > You: "Monte Cervino" CORRECT | | I wonder how many of the problems with this game could be fixed | by applying T5 itself to the answer grading. | craffel wrote: | Yes, unfortunately we have to rely on the very brittle "exact | match" method of evaluating whether an answer is correct. FWIW | and perhaps surprisingly, this is the primary way question- | answering systems are evaluated in common benchmarks. I totally | agree that fine-tuning T5 for answer grading would be super | interesting! | modeless wrote: | I think it makes some sense to evaluate models like this, as | you want to be conservative with the answers you accept | (though my second example shows that it isn't always | conservative), and models don't have feelings to hurt if they | are docked points for not being precise enough. Humans, of | course, are more sensitive. | foota wrote: | In case anyone from the team is watching, the colab link at the | bottom is broken. | luismmolina wrote: | it is working for me | craffel wrote: | Thanks, fixed! | lanekelly wrote: | Anyone know what's new in the blogpost? T5 has been out for a few | months now. | craffel wrote: | The blogpost has a summary of our paper from October (a bit | late, sorry!) but also has some (fun?) new results on closed- | book question answering and fill-in-the-blank text generation. | thatcherc wrote: | The 'fill in the N blanks' results at the end are fascinating! | N<64 are all pretty normal, but then for N=64 and N=512, it | starts going on about the old 1930s cookbook it has and its grad | school experiences! Wild. I think I would not be able to | distinguish this from a selection of real Amazon reviews or | similar informal text. | ComputerGuru wrote: | Isn't that the name of Microsoft's code generation templating | language? | zamalek wrote: | That's T4. The naming is pretty unfortunate. | atomoton wrote: | Wonder how this would compare with Watson at playing Jeopardy... | halflings wrote: | I would assume most QA (question answering) models blow Watson | out of the water. A lot has been done since then. See: | https://aclweb.org/aclwiki/Question_Answering_(State_of_the_... | benrbray wrote: | Sure, but Jeopardy is all about AQ (Answer-Questioning) :) | derefr wrote: | Now I'm curious what it'd give as output for the missing-tokens | task if you specialized it on understanding SVG vector image | data... ___________________________________________________________________ (page generated 2020-02-24 23:00 UTC)