hngopher.com

       [HN Gopher] Turing-NLG: A 17B-parameter language model
       ___________________________________________________________________
        
       Turing-NLG: A 17B-parameter language model
        
       Author : XnoiVeX
       Score  : 209 points
       Date   : 2020-02-10 17:31 UTC (5 hours ago)
        
 (HTM) web link (www.microsoft.com)
 (TXT) w3m dump (www.microsoft.com)
        
       | 0xff00ffee wrote:
       | B = Billion, not Byte. For second I was like, WTF?
        
         | ngcc_hk wrote:
         | I thought I am the only one. There is no trigger to deep
         | learning.
         | 
         | But the article is fascinating nevertheless. Not sure is
         | alphago breakthrough.
        
       | saurkt wrote:
       | One of the team members from Project Turing. Happy to answer any
       | questions.
        
         | flowerlad wrote:
         | How does it compare to Google's BERT and do you have an online
         | demo?
         | 
         | Here's a demo of BERT https://www.pragnakalp.com/demos/BERT-
         | NLP-QnA-Demo/
        
           | saurkt wrote:
           | (Similar to the response for another question.) BERT is a
           | language representation model while Turing-NLG is a language
           | generation model (similar to GPT). They are not directly
           | comparable (they can potentially be massaged to mimic the
           | other, but, not something that we have done yet.)
        
         | Tenoke wrote:
         | Any plans on training other (non nlp) huge models using ZeRO?
         | 
         | Specifically for Transformers - any plans to train a big model
         | with a bigger context window?
         | 
         | Not that this one isn't very impressive, of course.
        
         | alexwg wrote:
         | Have you evaluated against the AI2 Leaderboard benchmarks?
         | https://leaderboard.allenai.org/
        
           | saurkt wrote:
           | Not yet. We will try to run against those benchmarks soon.
        
         | osipov wrote:
         | Why the lack of number on the more popular SQuAD and Glue
         | benchmarks?
        
           | saurkt wrote:
           | SQUAD and GLUE are tasks for language representation models
           | -- aka BERT-like. This is a language generation model -- GPT-
           | like. Hence, SQUAD/GLUE test sets are not really applicable.
           | We are reporting on the wikitext and lambada sets that openAI
           | also uses for similar models (numbers are in the blogpost).
        
             | igravious wrote:
             | What's the difference between the two models?
        
       | riku_iki wrote:
       | unfortunately they abstained from participation in more popular
       | SQuAD and Glue benchmarks..
        
         | [deleted]
        
       | [deleted]
        
       | 01100011 wrote:
       | How long until the language models stabilize enough that we can
       | bake them into a low-cost, low-power chip for edge uses?
        
         | foota wrote:
         | I think this is largely unnecessary, can't things like TPUs
         | handle the inference?
        
           | the8472 wrote:
           | Putting all your speech/text onto cloud machines runs counter
           | to e2e encrypted messaging.
        
           | struxure wrote:
           | Using TPUs is expensive. Also some applications may need
           | small latency
        
       | galkk wrote:
       | Those summaries look impressive, although a bit repepetive
        
         | RobertDeNiro wrote:
         | What they don't tell you is that these summaries are always
         | hand picked from a few that were generated.
        
           | XnoiVeX wrote:
           | Quite possible but that also means that there is an
           | opportunity to implement some sort of RL to choose the best
           | possible summary.
        
         | Schiphol wrote:
         | I see what you did there?
        
           | aabhay wrote:
           | I see what you see what I did there
        
       | Tenoke wrote:
       | I expect we'll see some very interesting, very big models
       | following it. I didn't dig too far into the code but the library
       | looks very easy to use and will open up a lot of doors for people
       | who have a few or a few thousand GPUs.
        
         | ghawkescs wrote:
         | I must be missing it, where did you find a link to the code?
        
           | Tenoke wrote:
           | The code for the distributed training library, not the model
           | - https://github.com/microsoft/DeepSpeed/
        
       | corporateslave5 wrote:
       | People are vastly underestimating the changes that are about to
       | come from NLP. The basic ideas of how to get language models
       | working are just about in place. Transformer networks, and recent
       | innovations like GPT-2, googles reformer model, etc are
       | precursors to the real machine learning boom. Machine learning as
       | we have known it, has been stuck as an optimization tool, and
       | used for computer vision here and there. NLP, and with it, the
       | ability to create, synthesize, and understand content, will
       | change the internet.
       | 
       | More than that, I think NLP will unlock new ways of interacting
       | with computers. Computers will be able to handle the ambiguity of
       | human language, transcending their rigid "only do exactly what
       | you tell them" models of the world.
       | 
       | Edit:
       | 
       | Adding this to give more technical context. I think most people
       | don't know where the line is currently between what possible, and
       | what's not, but also what we are on the cusp of. And we are on
       | the cusp of a lot.
       | 
       | A quick explanation of one area is here:
       | 
       | Basically, transformer models are the best for NLP. They use
       | something called attention based mechanisms, which allows the
       | model to draw correlations between pieces of text/tokens that are
       | far apart. The issue is that this is an O(n^2) operation. So the
       | model is bounded by the context window, which is currently mostly
       | at 512 tokens, and is thus, bounded in how much it can
       | understand. Recent innovations, and further study, will broaden
       | the context window, and thus unlock better reading comprehension
       | and context understanding. For instance, the ability to answer a
       | question using a piece of text is mostly stuck at just finding
       | one paragraph. The future will see models that can find multiple
       | different paragraphs, understand how they relate, pull the
       | relevant information, and synthesize it. This sounds like a minor
       | step forwards, but its important. This will unlock better
       | conversational abilities, but also, better ways to understand how
       | different pieces of textual information relate. The scattershot
       | of information across the internet can go away. Computers can
       | better understand context to act on human intention through
       | language, unlocking the ability to handle ambiguity. This will
       | change the internet.
       | 
       | Again to empathize, these models only started showing up in 2017!
       | The progress has been rapid.
        
         | gzu wrote:
         | I can't wait for the day we see "deep-dream" styled literary
         | works.
        
           | jandrese wrote:
           | There have been plenty of "AI written" textual works, like
           | the AI Dungeon for a D&D game, or the AI Recipe Generator
           | that attempts to make something that looks like a cooking
           | recipe.
           | 
           | For the most part they aren't successful because the "AI"
           | isn't smart enough to have a goal in mind, so they end up
           | just monkey-cheesing everything. Pasting together snippets in
           | ways that are usually grammatically correct but make no
           | sense.
        
             | okigan wrote:
             | any books/papers that address "goal" oriented NLP or hybrid
             | based discussion?
        
               | ctoth wrote:
               | There's interesting research out of Deepmind on two parts
               | of this, which are using the Transformer model in
               | Reinforcement Learning contexts[0] and creating textual
               | GANs[1]. As you are probably aware, GANs are one of the
               | important tools that have driven forward image synthesis
               | and until recently it was impossible to apply them to
               | text, so I expect this to push us quite a bit forward.
               | There's also ongoing work in the selection of the metric
               | to use to evaluate the generated text, and discriminate
               | between human and machine-generated text.
               | 
               | [0] https://arxiv.org/abs/1910.06764
               | 
               | [1] https://arxiv.org/abs/1905.09922
        
               | jandrese wrote:
               | The ability to set your own goals and task yourself to
               | achieve them is the essence of AI. Not "AI" as we know it
               | today, but Sci-fi AI where it's a machine person.
        
               | sgt101 wrote:
               | Is it? It's pretty clear that this kind of activity isn't
               | that common in humans - mostly we pick up our goals from
               | cues and drivers in our environment and society.
        
               | joe_the_user wrote:
               | Indeed, but that kind of describes the gulf between
               | current language processing and present AI. Present AI
               | generates tokens that seem to have meaning or seem like
               | an appropriate response to a statement but where it
               | become evident after 2-3 paragraphs, there's no
               | substantial relation to either underlying meaning or
               | underlying goals.
               | 
               | Part of this is "underlying meaning" is an intuitive way
               | to describe things but whatever is underlying here is
               | more tenuous than a classical logic/GOFAI model of the
               | world but more "solid" than a long, clever stream of
               | associations.
        
               | okigan wrote:
               | If one interprets that "set you own goals" is the task of
               | AI then it's probably Sci-Fi AI, but that wasn't my
               | question.
               | 
               | Let's say we have a goal, "evaluate persons impression of
               | a particular book/topic/etc".
               | 
               | So the goal would be to have a conversation on this and
               | related topics that would (re)construct person's
               | impression.
               | 
               | Hence my question if there are any publications/articles
               | that explored that?
        
           | SQueeeeeL wrote:
           | But what would be the point... after the novelty wears off,
           | no one considers deep dream anything other than an outcome of
           | the system, not as art with an intended meaning (unless the
           | point of art without meaning)
        
         | the8472 wrote:
         | > More than that, I think NLP will unlock new ways of
         | interacting with computers. Computers will be able to handle
         | the ambiguity of human language, transcending their rigid "only
         | do exactly what you tell them" models of the world.
         | 
         | Such as google's duplex?
        
           | codingslave wrote:
           | Yes, but duplex doesn't work and was too early.
        
         | m0zg wrote:
         | I think you're the one who overestimates how much this will
         | affect NLP. I'd say bulk of what was possible to deliver with
         | this is already here, the subsequent changes will be
         | incremental.
         | 
         | The cold hard truth about statistical (and by extension, deep)
         | NLP is that it's just a fancy way of counting numbers mostly.
         | The only way to get to _real_ language understanding is AGI,
         | and _nobody_ is working on that. You fundamentally cannot
         | interact comfortably with a human if your system does not have
         | probabilistic, contextualized cognition, and can't incorporate
         | knowledge about the world.
        
           | corporateslave5 wrote:
           | I respectfully disagree. I've been in the field full time for
           | a few years, I watch the state of the art closely. It's hard
           | to see the thought/theoretical progression of deep
           | transformer models by just reading posts here and there.
           | 
           | I'm not saying these NLP methods will be some kind of AI,
           | just that they will produce products, content, and ways of
           | interacting with the world that are categorically different
           | from what we have seen in the past.
           | 
           | For instance, question and answering tasks have only recently
           | been able to:
           | 
           | Find an answer in a text document that spans multiple non
           | contiguous paragraphs
           | 
           | Understand context across a whole book.
           | 
           | The context window of current nlp is stuck at 512 tokens,
           | mostly because of computational complexity. This has been
           | broken just recently by the reformer model. Which is a
           | primitive, early way to get around the computation costs of
           | attention mechanisms.
           | 
           | Just wait. The ideas are there. They just take time to
           | refine.
        
             | skywhopper wrote:
             | Since you seem to be observing this closely, I'm curious
             | what steps are being taken to identify and avoid bias in
             | the results generated by these systems?
        
             | p1esk wrote:
             | What exactly do you expect from these Transformer-based
             | models? This particular one is underwhelming because it
             | provides a tiny improvement over the previous largest one
             | (from Nvidia) with more than double the size.
             | 
             | The fundamental limitation of these "optimization tools" as
             | you call them - they don't have any common sense, and any
             | way to query an external source of information (e.g.
             | wikipedia), or ask a human to clarify.
             | 
             | Another big problem is we don't have any way to do quality
             | filtering on the outputs. From my experiments with GPT-2,
             | it produces one interesting paragraph of text out of 20 -
             | if you squint at it really hard. And most of those 20 don't
             | make much sense at all.
             | 
             | So no, the existing ideas are definitely not enough. Maybe
             | some novel hybrid of symbolic AI with statistical
             | optimization will lead to a breakthrough. This one does not
             | strike me as anything other than "let's use moar weights!!"
        
               | ctoth wrote:
               | Some papers which you might find interesting:
               | 
               | Lin, B. Y., Chen, X., Chen, J., & Ren, X. (2019). KagNet:
               | Knowledge-Aware Graph Networks for Commonsense Reasoning.
               | 2822-2832. https://doi.org/10.18653/v1/d19-1282
               | 
               | Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q., Deng, H.,
               | & Wang, P. (2019). K-BERT: Enabling Language
               | Representation with Knowledge Graph. Retrieved from
               | http://arxiv.org/abs/1909.07606
               | 
               | Trinh, T. H., & Le, Q. V. (2019). Do Language Models Have
               | Common Sense? Iclr, 1-12.
               | 
               | Ostendorff, M., Bourgonje, P., Berger, M., Moreno-
               | Schneider, J., Rehm, G., & Gipp, B. (2019). Enriching
               | BERT with Knowledge Graph Embeddings for Document
               | Classification. Retrieved from
               | http://arxiv.org/abs/1909.08402
        
               | sanxiyn wrote:
               | > they don't have any common sense
               | 
               | What do you mean by this? Of course they do, learned from
               | their training data. For example, here is quote from
               | conversation 38 of https://github.com/google-
               | research/google-research/blob/mast...
               | 
               | Human: Do you like Korean food in general? Meena: It's
               | okay. I like beef bulgogi, but I'm not a huge fan of
               | kimchi.
               | 
               | It seems to me Meena "knows" bulgogi and kimchi are
               | Korean foods. Isn't that common sense? If it isn't, what
               | do you mean by "common sense"?
        
               | sgt101 wrote:
               | I'd buy that if Meena could infer and reason about her
               | own answers.
               | 
               | Human: Do you like Korean food in general?
               | 
               | Meena: It's okay. I like beef bulgogi, but I'm not a huge
               | fan of kimchi.
               | 
               | Human: Ok what should I shop for ?
               | 
               | Meena : You've got almost everything but you need a pear,
               | the steak and some ginger.
               | 
               | The problem with language models as commonsense is that
               | they are collections of patterns and associations, and
               | that they don't have inference models or solvers - unlike
               | my dog for example!
        
               | p1esk wrote:
               | Try asking it a follow up question not commonly found in
               | the training data: such as "do you think bulgogi would
               | grow on Mars?", and see what kind of gibberish you will
               | get in response. Moreover, the model has no way of self-
               | diagnosing whenever it produces gibberish.
        
             | m0zg wrote:
             | I'm a researcher in the field. Not in NLP anymore, but I
             | worked on that as well, years ago, and I keep up with the
             | research. You can't "understand the context" of "War and
             | Peace" unless you have real, actual AGI. I doubt actually
             | it can be fully understood at all when translated to
             | English and read by someone without the right cultural
             | background. This is an extreme example, chosen to make it
             | easy to see that it applies to any non-trivial text.
             | 
             | Let's take a question answering example. Take just about
             | any recent deep learning paper and try to answer detailed,
             | higher level questions against it. To use a concrete
             | example, take MobileNet V3 paper and ask your system "do I
             | put activation before or after squeeze and excitation"
             | (correct answer is "before"), or "do I need a bias in
             | squeeze and excitation module" (correct answer is "it
             | depends on the task"). You won't be able to, because a lot
             | of things are just _assumed_, just like in any other
             | realistic example of text written for human consumption.
             | The facts are encoded externally as information about the
             | world, and they're so fine grained and contextual, that we
             | don't even know how to begin incorporating them into the
             | answers, let alone do so contextually and
             | probabilistically, like human mind does.
        
               | corporateslave5 wrote:
               | Maybe. I take a really optimistic view of attention based
               | mechanisms, like I explained and added to my original
               | post. If you read the recent reformer paper, they produce
               | a new way of computation to start building a model that
               | can in some way, encode the relationships between
               | different parts of war and piece. The bottle neck right
               | now is computation, we don't know how well these models
               | can learn when that bottle neck is removed!
               | 
               | I'm optimistic because I believe that the contextual
               | information that you are describing, is already there in
               | the vast expanse of the internet.
               | 
               | But I will also add, I think none of this will spawn AI,
               | just that it will spawn new technologies that are
               | categorically different.
        
               | m0zg wrote:
               | I think it will merely spawn technologies that are less
               | fragile, yet still too fragile and unsophisticated to be
               | practical for full blown conversational user interface
               | the likes of which you see in the Avengers movies. It
               | will (and already does) make difference in simpler tasks
               | where you can get away with just counting numbers, such
               | as search/retrieval, simple summarization, constrained
               | dialogue in chatbots, stuff like that.
        
             | Barrin92 wrote:
             | >Find an answer in a text document that spans multiple non-
             | contiguous paragraphs
             | 
             | >Understand context across a whole book.
             | 
             | to have truly much utility when it comes to context a model
             | doesn't just need to correlate information across some
             | text, it also needs general knowledge and understanding so
             | that it can produce knowledge which is only implicit or not
             | even present in the text itself.
             | 
             | You can make the transformers as large as you want, NLP
             | models still fundamentally suck at answer trivial questions
             | like "If Pete was alive in 2000 and alive in 2050, was he
             | alive in 2025?"
        
             | b3kart wrote:
             | I am sorry, you come across as extremely over-enthusiastic,
             | without too many specifics beyond "we're just about to
             | figure it all out", "you just wait", and "it's gonna
             | revolutionize _everything_ ". We've seen this before with
             | ImageNet, didn't we? When everybody thought that because
             | ConvNets are crushing all the older methods, AI is right
             | around the corner. Well, it turned out to be much more
             | complicated than that, didn't it. Transformers are great
             | (well, if you have the compute that is) don't get me wrong,
             | but let's not get ahead of ourselves. The field is over-
             | hyped as it is.
        
               | bonoboTP wrote:
               | > When everybody thought that because ConvNets are
               | crushing all the older methods, AI is right around the
               | corner. Well, it turned out to be much more complicated
               | than that, didn't it.
               | 
               | I don't think anyone familiar with the area thought that
               | ConvNets will give us AGI.
               | 
               | However, their effect has been huge! It's hard to
               | overstate this. Computer vision used to be a small niche
               | topic, with tons of effort required to get something
               | working even on simple images. The quality of today's
               | ConvNet predictions is way beyond anybody's imagination
               | in around 2010. Models built around that time were like a
               | house of cards. Extremely carefully crafted for specific
               | scenarios, where moving one threshold a bit would destroy
               | your output.
        
               | codingslave wrote:
               | I'm not saying this will spawn AI. Just that, like
               | computer vision was essentially solved by CNNs, NLP will
               | be solved by transformer models.
        
               | thfuran wrote:
               | >computer vision was essentially solved by CNNs
               | 
               | That is a rather contentious claim.
        
               | b3kart wrote:
               | To say the least.
        
         | heavyarms wrote:
         | I would agree with you to a certain extent, but I still think
         | there is a big missing component to make this a reality. Larger
         | and more accurate general language models are great, but to
         | enable use cases other than categorization, translation,
         | summarization, etc., there will almost certainly have to a be
         | contextualized knowledge graph layer. This is basically what I
         | assume you mean when you say "the ability to create,
         | synthesize, and understand content." The way I see it,
         | transformer-based general language models will be the first and
         | last step of a NLP system. I other words, they will do the raw
         | processing of the input and will do the "take this output and
         | put it in natural language based on this context" portion.
         | Automating the part in the middle, the part that actually
         | understands what the text means and can do logic with it,
         | there's still no progress on that AFAIK.
         | 
         | This is similar to how computer vision models work great in
         | most cases, but to build a self-driving car you still need all
         | the other components that do path planning, predicting what the
         | cars around you will do based on the state of the environment,
         | etc.
        
           | codingslave wrote:
           | Replying from my laptop.
           | 
           | I agree, there needs to be a way to represent relationships
           | between information. I personally don't think knowledge
           | graphs will be the ones to do it, not because they dont work,
           | but because of how imperfect they are, in the data quality
           | sense.
           | 
           | See this paper here:
           | 
           | "DIFFERENTIABLE REASONING OVER A VIRTUAL KNOWLEDGE BASE"
           | https://openreview.net/pdf?id=SJxstlHFPH
           | 
           | Which is a recent effort, among many, by google research to
           | build a model that can view a document as a knowledge graph,
           | instead of explicitly tying pieces of the document to the
           | graph, the idea to is to create a graph from the document.
           | This is paper is a bit different from that, they do input a
           | knowledge graph for training, but I think the idea and track
           | of where they are headed has a ton of room to evolve. The
           | trick is that transformer models have unlocked the ability to
           | understand the text, so all of this "quasi knowledge graph
           | extraction" that i was just explaining, is only recently
           | possible! There's no research on it, because the baseline
           | understanding of tokens has been too primitive. This is why
           | there is so much room to grow, BERT has unlocked new methods,
           | it can be used as a base for a ton of new NLP.
           | 
           | Just to emphasize again, I'm not saying what I outlined above
           | will be a good way to do it, just that ideas like this could
           | only be tested recently. There's a million new ways to spin
           | this problem.
        
             | cfoster0 wrote:
             | Related: a recent paper from Antoine Bosselut and Yejin
             | Choi explores dynamically constructing a context-specific,
             | common sense knowledge graph using a transformer, in the
             | context of question answering.
             | 
             | https://arxiv.org/abs/1911.03876
        
         | joe_the_user wrote:
         | _Computers will be able to handle the ambiguity of human
         | language, transcending their rigid "only do exactly what you
         | tell them" models of the world._
         | 
         | So, are reasonable examples now of these models allowing
         | semantic context? So, far, what I have seen is generated text
         | where the lack of _understanding_ takes three paragraphs to
         | become obvious rather than one.
         | 
         | Human language is this marvelous framework involving symbols
         | associating with other symbols as well as to well-known and
         | vaguely-guessed facts about the world.
         | 
         | Human relations are very robust and, for example, two people
         | can have a longish conversations where at the end, they realize
         | they're talking about two different people (or different days
         | or events). But in those circumstances, they can correct and
         | adjust. "Solid" understanding is there but it's under a lot of
         | layers of social cues and protocols and multiple meanings.
        
         | [deleted]
        
         | lostcolony wrote:
         | >More than that, I think NLP will unlock new ways of
         | interacting with computers. Computers will be able to handle
         | the ambiguity of human language, transcending their rigid "only
         | do exactly what you tell them" models of the world.
         | 
         | As exciting as this sounds, I can't help but feel that given
         | -we- haven't figured out how to handle the ambiguity of human
         | language, I'm not convinced a computer attempting to is really
         | markedly better for many use cases than requiring exactness.
         | But operating at a human level of 'understanding', and being
         | broadly accessible, may be enough to change the world.
         | Hopefully for the better.
        
         | lopmotr wrote:
         | Can you explain your reasoning? It's easy to imagine a new
         | invention or idea will revolutionize everything because it's
         | never been done before and feels powerful, but even if it
         | works, it might not. It sounds like you believe NLP will enable
         | more general voice control of computers than Siri/Alexa/etc.
         | But will that really be much more significant than people
         | expect? Google is already pretty good at understanding
         | ambiguous queries. It's eliminated the need to know where to
         | look for information. Or are you talking about the computer
         | writing code or doing business itself?
        
           | codingslave wrote:
           | Replying on my laptop.
           | 
           | Basically, transformer models are the best for NLP. They use
           | something called attention based mechanisms, which allows the
           | model to draw correlations between pieces of text/tokens that
           | are far apart. The issue is that this is an O(n^2) operation.
           | So the model is bounded by the context window, which is
           | currently mostly at 512 tokens, and is thus, bounded in how
           | much it can understand.
           | 
           | Recent innovations, and further study, will broaden the
           | context window, and thus unlock better reading comprehension
           | and context understanding.
           | 
           | For instance, the ability to answer a question using a piece
           | of text is mostly stuck at just finding one paragraph. The
           | future will see models that can find multiple different
           | paragraphs, understand how they relate, pull the relevant
           | information, and synthesize it. This sounds like a minor step
           | forwards, but its important.
           | 
           | This will unlock better conversational abilities, but also,
           | better ways to understand how different pieces of textual
           | information relate. The scattershot of information across the
           | internet can go away. Computers can better understand context
           | to act on human intention through language, unlocking the
           | ability to handle ambiguity. This will change the internet.
        
             | skywhopper wrote:
             | If this is really where the researchers think these tools
             | are headed (and I don't really doubt you on that point),
             | then this is incredibly dangerous stuff. No matter how good
             | your system is, the impact of implicit, unintentional, and
             | non-targeted bias is huge on the sorts of content these
             | systems will produce. But expose it to the levels of
             | intentional manipulation present on the Internet of today,
             | and these models don't stand a chance of producing
             | something that safely does what you claim.
             | 
             | I'm happy to be wrong about this, but I'm not seeing any
             | discussion about the safety and security of using these
             | systems. And if it's not even being discussed, we can be
             | sure nothing's actually being done about it. Selling
             | promises of active-agent computers interpreting human
             | intent and summarizing information from the Internet
             | without addressing this concern is irresponsible at this
             | point.
        
         | K0SM0S wrote:
         | If what you claim is true, it's not just the internet that will
         | be changed by NLP but all of civilization!
        
         | Tenoke wrote:
         | > mostly at 512 token
         | 
         | GPT uses 1024 tokens context window which does work out to a
         | fair amount (given the massive vocab of 50k+ which means a
         | token can be more than a word), though of course it's pretty
         | limited. Google's recent Reformer[0] allows you to do attention
         | much more cheaply and I'm currently training a Reformer model
         | that isn't quite as big but has a context of ~64k tokens
         | (though a much smaller vocab). I'm not completely sure if this
         | is the solution but it looks like a step in that direction and
         | so far the model is doing pretty good (I also plan to post the
         | weights when I'm finished, though I am not sure if Google don't
         | just plan to do that themselves).
         | 
         | I am somewhat disappointed they went with just 1024 for this
         | model, too though.
         | 
         | 0. https://ai.googleblog.com/2020/01/reformer-efficient-
         | transfo...
        
           | corporateslave5 wrote:
           | Yep, the number has been expanding. I just picked 512 because
           | Bert uses that. But yeah 50k and beyond unlocks new ways of
           | using context. I can't wait to see how they optimize and
           | improve the reformer model.
           | 
           | I expect there to be an improvement like there was for
           | 
           | BERT -> Albert
        
         | KKKKkkkk1 wrote:
         | Feels like I've heard this tune before. Actually, I think it
         | was first played in the 1950s.
        
         | skywhopper wrote:
         | Can you explain more about what you mean by "able to handle the
         | ambiguity of human language"? Such a claim goes far beyond
         | language processing itself. Such a capability would have to
         | involve cultural and interpersonal dynamics, domain knowledge,
         | and definitive agency on the part of the computer.
         | 
         | And all of _that_ is just to interpret sincere, honest attempts
         | at communication. Can it safely and appropriately handle humor,
         | irony, and sarcasm? What about coordinated malicious attacks?
        
       | [deleted]
        
       | lowdose wrote:
       | This does GPT-2 X 10. For anyone wondering what GPT-2 is doing
       | look at this baffling subreddit and marvel at how one GPT-2 model
       | trained for $70k spits out better comedy than everybody on the
       | payroll of Netflix combined.
       | 
       | https://www.reddit.com/r/SubSimulatorGPT2/
        
         | jeromebaek wrote:
         | gosh, this is unsettling. my brain is literally getting stuck
         | on an infinite loop trying to read this and coherently put them
         | together
        
         | woodgrainz wrote:
         | Comedy is right.
         | 
         | "I'm really starting to get worried about my Higgs Boson (HBN)
         | after watching some videos on YouTube" [0]
         | 
         | "This repost and all of your posts are garbage." [1]
         | 
         | "It's the most random and unoriginal shit I've ever read." [2]
         | 
         | [0]
         | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1sqyh/an...
         | 
         | [1]
         | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1vfnv/wh...
         | 
         | [2]
         | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1o83a/fo...
        
         | paganel wrote:
         | There's nothing inherently funny about entries like this one
         | [1], I mean, there is, as in it is sort of funny how the AI got
         | tricked so quickly into doing incest jokes, but I guess that
         | was not the research team's intended goal.
         | 
         | [1]
         | https://old.reddit.com/r/SubSimulatorGPT2/comments/f1ifp6/my...
        
           | RicardoLuis0 wrote:
           | if you look closer into the usernames, each "bot" is trained
           | on a specific subreddit, and when taking the subreddit
           | context into account, for this one post in particular,
           | "r/twosentencehorror", i'd say it isn't half-bad.
        
         | istjohn wrote:
         | Simulating r/StonerPhilosophy[1]
         | 
         | Post: Do we live in a simulation?
         | 
         | Comment: I just realized, we are a simulation, and we are a
         | simulated simulation.
         | 
         | Comment: We're all in a simulation. We're still here. We're all
         | in this little ball together
         | 
         | Comment: The simulation hypothesis states that we are in a
         | simulation. Which means that there is a possibility that we are
         | not in a simulation.
         | 
         | [1]
         | https://www.reddit.com/r/SubSimulatorGPT2/comments/ez6qtj/do...
        
         | SkyBelow wrote:
         | My favorite so far is the science bot. 75% of the top level
         | posts are explaining why the submission was removed.
        
         | minimaxir wrote:
         | GPT-2 X 10 is misleading; this model size is 10x, sure, but
         | that doesn't _necessarily_ mean the output will be 10x better.
         | For r /SubSimulatorGPT2, it did go from 355M to 1.5B recently,
         | but the quality isn't necessarily 4x (although it did improve).
         | 
         | I'm more interested in _shrinking_ models that maintain the
         | same level of generative robustness (e.g. distillation, with
         | distilGPT2)
        
         | speedgoose wrote:
         | It's definitely better than the original using Markov chains.
         | It fits very well this use case, and in my opinion only this
         | use case.
         | 
         | GPT2 is still very random and quite stupid.
         | 
         | You start it with your love for your girlfriend as a context,
         | she becomes a cam girl into hard core anal two paragraphs
         | later. You start with religion, "Muslims must be exterminated".
         | You start with software and you get a description of non
         | existent hardware with instructions about how to setup a VPN in
         | the middle. You start with news, and you can read than China
         | supports the Islamic state.
         | 
         | That's cool because it has more context than Markov chains
         | which usually have only 3 words of context, but it's still a
         | long way to go before I trust anything generated by this kind
         | of algorithm.
        
           | dilap wrote:
           | This stuff is pretty much indistinguishable from the real
           | thing...
           | 
           | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1pypf/so.
           | ..
        
           | undoware wrote:
           | As a trans woman I can't help but notice that you rank me
           | being your girlfriend at about the same level of beyond-the-
           | pale-ness as a genocide.
           | 
           | The feeling is mutual
        
             | speedgoose wrote:
             | Sorry I didn't think one could interpret it like this, this
             | was inappropriate indeed. I edited.
        
       | danharaj wrote:
       | If your model has 17 billion parameters, you missed some.
        
       ___________________________________________________________________
       (page generated 2020-02-10 23:00 UTC)