[HN Gopher] Turing-NLG: A 17B-parameter language model ___________________________________________________________________ Turing-NLG: A 17B-parameter language model Author : XnoiVeX Score : 209 points Date : 2020-02-10 17:31 UTC (5 hours ago) (HTM) web link (www.microsoft.com) (TXT) w3m dump (www.microsoft.com) | 0xff00ffee wrote: | B = Billion, not Byte. For second I was like, WTF? | ngcc_hk wrote: | I thought I am the only one. There is no trigger to deep | learning. | | But the article is fascinating nevertheless. Not sure is | alphago breakthrough. | saurkt wrote: | One of the team members from Project Turing. Happy to answer any | questions. | flowerlad wrote: | How does it compare to Google's BERT and do you have an online | demo? | | Here's a demo of BERT https://www.pragnakalp.com/demos/BERT- | NLP-QnA-Demo/ | saurkt wrote: | (Similar to the response for another question.) BERT is a | language representation model while Turing-NLG is a language | generation model (similar to GPT). They are not directly | comparable (they can potentially be massaged to mimic the | other, but, not something that we have done yet.) | Tenoke wrote: | Any plans on training other (non nlp) huge models using ZeRO? | | Specifically for Transformers - any plans to train a big model | with a bigger context window? | | Not that this one isn't very impressive, of course. | alexwg wrote: | Have you evaluated against the AI2 Leaderboard benchmarks? | https://leaderboard.allenai.org/ | saurkt wrote: | Not yet. We will try to run against those benchmarks soon. | osipov wrote: | Why the lack of number on the more popular SQuAD and Glue | benchmarks? | saurkt wrote: | SQUAD and GLUE are tasks for language representation models | -- aka BERT-like. This is a language generation model -- GPT- | like. Hence, SQUAD/GLUE test sets are not really applicable. | We are reporting on the wikitext and lambada sets that openAI | also uses for similar models (numbers are in the blogpost). | igravious wrote: | What's the difference between the two models? | riku_iki wrote: | unfortunately they abstained from participation in more popular | SQuAD and Glue benchmarks.. | [deleted] | [deleted] | 01100011 wrote: | How long until the language models stabilize enough that we can | bake them into a low-cost, low-power chip for edge uses? | foota wrote: | I think this is largely unnecessary, can't things like TPUs | handle the inference? | the8472 wrote: | Putting all your speech/text onto cloud machines runs counter | to e2e encrypted messaging. | struxure wrote: | Using TPUs is expensive. Also some applications may need | small latency | galkk wrote: | Those summaries look impressive, although a bit repepetive | RobertDeNiro wrote: | What they don't tell you is that these summaries are always | hand picked from a few that were generated. | XnoiVeX wrote: | Quite possible but that also means that there is an | opportunity to implement some sort of RL to choose the best | possible summary. | Schiphol wrote: | I see what you did there? | aabhay wrote: | I see what you see what I did there | Tenoke wrote: | I expect we'll see some very interesting, very big models | following it. I didn't dig too far into the code but the library | looks very easy to use and will open up a lot of doors for people | who have a few or a few thousand GPUs. | ghawkescs wrote: | I must be missing it, where did you find a link to the code? | Tenoke wrote: | The code for the distributed training library, not the model | - https://github.com/microsoft/DeepSpeed/ | corporateslave5 wrote: | People are vastly underestimating the changes that are about to | come from NLP. The basic ideas of how to get language models | working are just about in place. Transformer networks, and recent | innovations like GPT-2, googles reformer model, etc are | precursors to the real machine learning boom. Machine learning as | we have known it, has been stuck as an optimization tool, and | used for computer vision here and there. NLP, and with it, the | ability to create, synthesize, and understand content, will | change the internet. | | More than that, I think NLP will unlock new ways of interacting | with computers. Computers will be able to handle the ambiguity of | human language, transcending their rigid "only do exactly what | you tell them" models of the world. | | Edit: | | Adding this to give more technical context. I think most people | don't know where the line is currently between what possible, and | what's not, but also what we are on the cusp of. And we are on | the cusp of a lot. | | A quick explanation of one area is here: | | Basically, transformer models are the best for NLP. They use | something called attention based mechanisms, which allows the | model to draw correlations between pieces of text/tokens that are | far apart. The issue is that this is an O(n^2) operation. So the | model is bounded by the context window, which is currently mostly | at 512 tokens, and is thus, bounded in how much it can | understand. Recent innovations, and further study, will broaden | the context window, and thus unlock better reading comprehension | and context understanding. For instance, the ability to answer a | question using a piece of text is mostly stuck at just finding | one paragraph. The future will see models that can find multiple | different paragraphs, understand how they relate, pull the | relevant information, and synthesize it. This sounds like a minor | step forwards, but its important. This will unlock better | conversational abilities, but also, better ways to understand how | different pieces of textual information relate. The scattershot | of information across the internet can go away. Computers can | better understand context to act on human intention through | language, unlocking the ability to handle ambiguity. This will | change the internet. | | Again to empathize, these models only started showing up in 2017! | The progress has been rapid. | gzu wrote: | I can't wait for the day we see "deep-dream" styled literary | works. | jandrese wrote: | There have been plenty of "AI written" textual works, like | the AI Dungeon for a D&D game, or the AI Recipe Generator | that attempts to make something that looks like a cooking | recipe. | | For the most part they aren't successful because the "AI" | isn't smart enough to have a goal in mind, so they end up | just monkey-cheesing everything. Pasting together snippets in | ways that are usually grammatically correct but make no | sense. | okigan wrote: | any books/papers that address "goal" oriented NLP or hybrid | based discussion? | ctoth wrote: | There's interesting research out of Deepmind on two parts | of this, which are using the Transformer model in | Reinforcement Learning contexts[0] and creating textual | GANs[1]. As you are probably aware, GANs are one of the | important tools that have driven forward image synthesis | and until recently it was impossible to apply them to | text, so I expect this to push us quite a bit forward. | There's also ongoing work in the selection of the metric | to use to evaluate the generated text, and discriminate | between human and machine-generated text. | | [0] https://arxiv.org/abs/1910.06764 | | [1] https://arxiv.org/abs/1905.09922 | jandrese wrote: | The ability to set your own goals and task yourself to | achieve them is the essence of AI. Not "AI" as we know it | today, but Sci-fi AI where it's a machine person. | sgt101 wrote: | Is it? It's pretty clear that this kind of activity isn't | that common in humans - mostly we pick up our goals from | cues and drivers in our environment and society. | joe_the_user wrote: | Indeed, but that kind of describes the gulf between | current language processing and present AI. Present AI | generates tokens that seem to have meaning or seem like | an appropriate response to a statement but where it | become evident after 2-3 paragraphs, there's no | substantial relation to either underlying meaning or | underlying goals. | | Part of this is "underlying meaning" is an intuitive way | to describe things but whatever is underlying here is | more tenuous than a classical logic/GOFAI model of the | world but more "solid" than a long, clever stream of | associations. | okigan wrote: | If one interprets that "set you own goals" is the task of | AI then it's probably Sci-Fi AI, but that wasn't my | question. | | Let's say we have a goal, "evaluate persons impression of | a particular book/topic/etc". | | So the goal would be to have a conversation on this and | related topics that would (re)construct person's | impression. | | Hence my question if there are any publications/articles | that explored that? | SQueeeeeL wrote: | But what would be the point... after the novelty wears off, | no one considers deep dream anything other than an outcome of | the system, not as art with an intended meaning (unless the | point of art without meaning) | the8472 wrote: | > More than that, I think NLP will unlock new ways of | interacting with computers. Computers will be able to handle | the ambiguity of human language, transcending their rigid "only | do exactly what you tell them" models of the world. | | Such as google's duplex? | codingslave wrote: | Yes, but duplex doesn't work and was too early. | m0zg wrote: | I think you're the one who overestimates how much this will | affect NLP. I'd say bulk of what was possible to deliver with | this is already here, the subsequent changes will be | incremental. | | The cold hard truth about statistical (and by extension, deep) | NLP is that it's just a fancy way of counting numbers mostly. | The only way to get to _real_ language understanding is AGI, | and _nobody_ is working on that. You fundamentally cannot | interact comfortably with a human if your system does not have | probabilistic, contextualized cognition, and can't incorporate | knowledge about the world. | corporateslave5 wrote: | I respectfully disagree. I've been in the field full time for | a few years, I watch the state of the art closely. It's hard | to see the thought/theoretical progression of deep | transformer models by just reading posts here and there. | | I'm not saying these NLP methods will be some kind of AI, | just that they will produce products, content, and ways of | interacting with the world that are categorically different | from what we have seen in the past. | | For instance, question and answering tasks have only recently | been able to: | | Find an answer in a text document that spans multiple non | contiguous paragraphs | | Understand context across a whole book. | | The context window of current nlp is stuck at 512 tokens, | mostly because of computational complexity. This has been | broken just recently by the reformer model. Which is a | primitive, early way to get around the computation costs of | attention mechanisms. | | Just wait. The ideas are there. They just take time to | refine. | skywhopper wrote: | Since you seem to be observing this closely, I'm curious | what steps are being taken to identify and avoid bias in | the results generated by these systems? | p1esk wrote: | What exactly do you expect from these Transformer-based | models? This particular one is underwhelming because it | provides a tiny improvement over the previous largest one | (from Nvidia) with more than double the size. | | The fundamental limitation of these "optimization tools" as | you call them - they don't have any common sense, and any | way to query an external source of information (e.g. | wikipedia), or ask a human to clarify. | | Another big problem is we don't have any way to do quality | filtering on the outputs. From my experiments with GPT-2, | it produces one interesting paragraph of text out of 20 - | if you squint at it really hard. And most of those 20 don't | make much sense at all. | | So no, the existing ideas are definitely not enough. Maybe | some novel hybrid of symbolic AI with statistical | optimization will lead to a breakthrough. This one does not | strike me as anything other than "let's use moar weights!!" | ctoth wrote: | Some papers which you might find interesting: | | Lin, B. Y., Chen, X., Chen, J., & Ren, X. (2019). KagNet: | Knowledge-Aware Graph Networks for Commonsense Reasoning. | 2822-2832. https://doi.org/10.18653/v1/d19-1282 | | Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, Q., Deng, H., | & Wang, P. (2019). K-BERT: Enabling Language | Representation with Knowledge Graph. Retrieved from | http://arxiv.org/abs/1909.07606 | | Trinh, T. H., & Le, Q. V. (2019). Do Language Models Have | Common Sense? Iclr, 1-12. | | Ostendorff, M., Bourgonje, P., Berger, M., Moreno- | Schneider, J., Rehm, G., & Gipp, B. (2019). Enriching | BERT with Knowledge Graph Embeddings for Document | Classification. Retrieved from | http://arxiv.org/abs/1909.08402 | sanxiyn wrote: | > they don't have any common sense | | What do you mean by this? Of course they do, learned from | their training data. For example, here is quote from | conversation 38 of https://github.com/google- | research/google-research/blob/mast... | | Human: Do you like Korean food in general? Meena: It's | okay. I like beef bulgogi, but I'm not a huge fan of | kimchi. | | It seems to me Meena "knows" bulgogi and kimchi are | Korean foods. Isn't that common sense? If it isn't, what | do you mean by "common sense"? | sgt101 wrote: | I'd buy that if Meena could infer and reason about her | own answers. | | Human: Do you like Korean food in general? | | Meena: It's okay. I like beef bulgogi, but I'm not a huge | fan of kimchi. | | Human: Ok what should I shop for ? | | Meena : You've got almost everything but you need a pear, | the steak and some ginger. | | The problem with language models as commonsense is that | they are collections of patterns and associations, and | that they don't have inference models or solvers - unlike | my dog for example! | p1esk wrote: | Try asking it a follow up question not commonly found in | the training data: such as "do you think bulgogi would | grow on Mars?", and see what kind of gibberish you will | get in response. Moreover, the model has no way of self- | diagnosing whenever it produces gibberish. | m0zg wrote: | I'm a researcher in the field. Not in NLP anymore, but I | worked on that as well, years ago, and I keep up with the | research. You can't "understand the context" of "War and | Peace" unless you have real, actual AGI. I doubt actually | it can be fully understood at all when translated to | English and read by someone without the right cultural | background. This is an extreme example, chosen to make it | easy to see that it applies to any non-trivial text. | | Let's take a question answering example. Take just about | any recent deep learning paper and try to answer detailed, | higher level questions against it. To use a concrete | example, take MobileNet V3 paper and ask your system "do I | put activation before or after squeeze and excitation" | (correct answer is "before"), or "do I need a bias in | squeeze and excitation module" (correct answer is "it | depends on the task"). You won't be able to, because a lot | of things are just _assumed_, just like in any other | realistic example of text written for human consumption. | The facts are encoded externally as information about the | world, and they're so fine grained and contextual, that we | don't even know how to begin incorporating them into the | answers, let alone do so contextually and | probabilistically, like human mind does. | corporateslave5 wrote: | Maybe. I take a really optimistic view of attention based | mechanisms, like I explained and added to my original | post. If you read the recent reformer paper, they produce | a new way of computation to start building a model that | can in some way, encode the relationships between | different parts of war and piece. The bottle neck right | now is computation, we don't know how well these models | can learn when that bottle neck is removed! | | I'm optimistic because I believe that the contextual | information that you are describing, is already there in | the vast expanse of the internet. | | But I will also add, I think none of this will spawn AI, | just that it will spawn new technologies that are | categorically different. | m0zg wrote: | I think it will merely spawn technologies that are less | fragile, yet still too fragile and unsophisticated to be | practical for full blown conversational user interface | the likes of which you see in the Avengers movies. It | will (and already does) make difference in simpler tasks | where you can get away with just counting numbers, such | as search/retrieval, simple summarization, constrained | dialogue in chatbots, stuff like that. | Barrin92 wrote: | >Find an answer in a text document that spans multiple non- | contiguous paragraphs | | >Understand context across a whole book. | | to have truly much utility when it comes to context a model | doesn't just need to correlate information across some | text, it also needs general knowledge and understanding so | that it can produce knowledge which is only implicit or not | even present in the text itself. | | You can make the transformers as large as you want, NLP | models still fundamentally suck at answer trivial questions | like "If Pete was alive in 2000 and alive in 2050, was he | alive in 2025?" | b3kart wrote: | I am sorry, you come across as extremely over-enthusiastic, | without too many specifics beyond "we're just about to | figure it all out", "you just wait", and "it's gonna | revolutionize _everything_ ". We've seen this before with | ImageNet, didn't we? When everybody thought that because | ConvNets are crushing all the older methods, AI is right | around the corner. Well, it turned out to be much more | complicated than that, didn't it. Transformers are great | (well, if you have the compute that is) don't get me wrong, | but let's not get ahead of ourselves. The field is over- | hyped as it is. | bonoboTP wrote: | > When everybody thought that because ConvNets are | crushing all the older methods, AI is right around the | corner. Well, it turned out to be much more complicated | than that, didn't it. | | I don't think anyone familiar with the area thought that | ConvNets will give us AGI. | | However, their effect has been huge! It's hard to | overstate this. Computer vision used to be a small niche | topic, with tons of effort required to get something | working even on simple images. The quality of today's | ConvNet predictions is way beyond anybody's imagination | in around 2010. Models built around that time were like a | house of cards. Extremely carefully crafted for specific | scenarios, where moving one threshold a bit would destroy | your output. | codingslave wrote: | I'm not saying this will spawn AI. Just that, like | computer vision was essentially solved by CNNs, NLP will | be solved by transformer models. | thfuran wrote: | >computer vision was essentially solved by CNNs | | That is a rather contentious claim. | b3kart wrote: | To say the least. | heavyarms wrote: | I would agree with you to a certain extent, but I still think | there is a big missing component to make this a reality. Larger | and more accurate general language models are great, but to | enable use cases other than categorization, translation, | summarization, etc., there will almost certainly have to a be | contextualized knowledge graph layer. This is basically what I | assume you mean when you say "the ability to create, | synthesize, and understand content." The way I see it, | transformer-based general language models will be the first and | last step of a NLP system. I other words, they will do the raw | processing of the input and will do the "take this output and | put it in natural language based on this context" portion. | Automating the part in the middle, the part that actually | understands what the text means and can do logic with it, | there's still no progress on that AFAIK. | | This is similar to how computer vision models work great in | most cases, but to build a self-driving car you still need all | the other components that do path planning, predicting what the | cars around you will do based on the state of the environment, | etc. | codingslave wrote: | Replying from my laptop. | | I agree, there needs to be a way to represent relationships | between information. I personally don't think knowledge | graphs will be the ones to do it, not because they dont work, | but because of how imperfect they are, in the data quality | sense. | | See this paper here: | | "DIFFERENTIABLE REASONING OVER A VIRTUAL KNOWLEDGE BASE" | https://openreview.net/pdf?id=SJxstlHFPH | | Which is a recent effort, among many, by google research to | build a model that can view a document as a knowledge graph, | instead of explicitly tying pieces of the document to the | graph, the idea to is to create a graph from the document. | This is paper is a bit different from that, they do input a | knowledge graph for training, but I think the idea and track | of where they are headed has a ton of room to evolve. The | trick is that transformer models have unlocked the ability to | understand the text, so all of this "quasi knowledge graph | extraction" that i was just explaining, is only recently | possible! There's no research on it, because the baseline | understanding of tokens has been too primitive. This is why | there is so much room to grow, BERT has unlocked new methods, | it can be used as a base for a ton of new NLP. | | Just to emphasize again, I'm not saying what I outlined above | will be a good way to do it, just that ideas like this could | only be tested recently. There's a million new ways to spin | this problem. | cfoster0 wrote: | Related: a recent paper from Antoine Bosselut and Yejin | Choi explores dynamically constructing a context-specific, | common sense knowledge graph using a transformer, in the | context of question answering. | | https://arxiv.org/abs/1911.03876 | joe_the_user wrote: | _Computers will be able to handle the ambiguity of human | language, transcending their rigid "only do exactly what you | tell them" models of the world._ | | So, are reasonable examples now of these models allowing | semantic context? So, far, what I have seen is generated text | where the lack of _understanding_ takes three paragraphs to | become obvious rather than one. | | Human language is this marvelous framework involving symbols | associating with other symbols as well as to well-known and | vaguely-guessed facts about the world. | | Human relations are very robust and, for example, two people | can have a longish conversations where at the end, they realize | they're talking about two different people (or different days | or events). But in those circumstances, they can correct and | adjust. "Solid" understanding is there but it's under a lot of | layers of social cues and protocols and multiple meanings. | [deleted] | lostcolony wrote: | >More than that, I think NLP will unlock new ways of | interacting with computers. Computers will be able to handle | the ambiguity of human language, transcending their rigid "only | do exactly what you tell them" models of the world. | | As exciting as this sounds, I can't help but feel that given | -we- haven't figured out how to handle the ambiguity of human | language, I'm not convinced a computer attempting to is really | markedly better for many use cases than requiring exactness. | But operating at a human level of 'understanding', and being | broadly accessible, may be enough to change the world. | Hopefully for the better. | lopmotr wrote: | Can you explain your reasoning? It's easy to imagine a new | invention or idea will revolutionize everything because it's | never been done before and feels powerful, but even if it | works, it might not. It sounds like you believe NLP will enable | more general voice control of computers than Siri/Alexa/etc. | But will that really be much more significant than people | expect? Google is already pretty good at understanding | ambiguous queries. It's eliminated the need to know where to | look for information. Or are you talking about the computer | writing code or doing business itself? | codingslave wrote: | Replying on my laptop. | | Basically, transformer models are the best for NLP. They use | something called attention based mechanisms, which allows the | model to draw correlations between pieces of text/tokens that | are far apart. The issue is that this is an O(n^2) operation. | So the model is bounded by the context window, which is | currently mostly at 512 tokens, and is thus, bounded in how | much it can understand. | | Recent innovations, and further study, will broaden the | context window, and thus unlock better reading comprehension | and context understanding. | | For instance, the ability to answer a question using a piece | of text is mostly stuck at just finding one paragraph. The | future will see models that can find multiple different | paragraphs, understand how they relate, pull the relevant | information, and synthesize it. This sounds like a minor step | forwards, but its important. | | This will unlock better conversational abilities, but also, | better ways to understand how different pieces of textual | information relate. The scattershot of information across the | internet can go away. Computers can better understand context | to act on human intention through language, unlocking the | ability to handle ambiguity. This will change the internet. | skywhopper wrote: | If this is really where the researchers think these tools | are headed (and I don't really doubt you on that point), | then this is incredibly dangerous stuff. No matter how good | your system is, the impact of implicit, unintentional, and | non-targeted bias is huge on the sorts of content these | systems will produce. But expose it to the levels of | intentional manipulation present on the Internet of today, | and these models don't stand a chance of producing | something that safely does what you claim. | | I'm happy to be wrong about this, but I'm not seeing any | discussion about the safety and security of using these | systems. And if it's not even being discussed, we can be | sure nothing's actually being done about it. Selling | promises of active-agent computers interpreting human | intent and summarizing information from the Internet | without addressing this concern is irresponsible at this | point. | K0SM0S wrote: | If what you claim is true, it's not just the internet that will | be changed by NLP but all of civilization! | Tenoke wrote: | > mostly at 512 token | | GPT uses 1024 tokens context window which does work out to a | fair amount (given the massive vocab of 50k+ which means a | token can be more than a word), though of course it's pretty | limited. Google's recent Reformer[0] allows you to do attention | much more cheaply and I'm currently training a Reformer model | that isn't quite as big but has a context of ~64k tokens | (though a much smaller vocab). I'm not completely sure if this | is the solution but it looks like a step in that direction and | so far the model is doing pretty good (I also plan to post the | weights when I'm finished, though I am not sure if Google don't | just plan to do that themselves). | | I am somewhat disappointed they went with just 1024 for this | model, too though. | | 0. https://ai.googleblog.com/2020/01/reformer-efficient- | transfo... | corporateslave5 wrote: | Yep, the number has been expanding. I just picked 512 because | Bert uses that. But yeah 50k and beyond unlocks new ways of | using context. I can't wait to see how they optimize and | improve the reformer model. | | I expect there to be an improvement like there was for | | BERT -> Albert | KKKKkkkk1 wrote: | Feels like I've heard this tune before. Actually, I think it | was first played in the 1950s. | skywhopper wrote: | Can you explain more about what you mean by "able to handle the | ambiguity of human language"? Such a claim goes far beyond | language processing itself. Such a capability would have to | involve cultural and interpersonal dynamics, domain knowledge, | and definitive agency on the part of the computer. | | And all of _that_ is just to interpret sincere, honest attempts | at communication. Can it safely and appropriately handle humor, | irony, and sarcasm? What about coordinated malicious attacks? | [deleted] | lowdose wrote: | This does GPT-2 X 10. For anyone wondering what GPT-2 is doing | look at this baffling subreddit and marvel at how one GPT-2 model | trained for $70k spits out better comedy than everybody on the | payroll of Netflix combined. | | https://www.reddit.com/r/SubSimulatorGPT2/ | jeromebaek wrote: | gosh, this is unsettling. my brain is literally getting stuck | on an infinite loop trying to read this and coherently put them | together | woodgrainz wrote: | Comedy is right. | | "I'm really starting to get worried about my Higgs Boson (HBN) | after watching some videos on YouTube" [0] | | "This repost and all of your posts are garbage." [1] | | "It's the most random and unoriginal shit I've ever read." [2] | | [0] | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1sqyh/an... | | [1] | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1vfnv/wh... | | [2] | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1o83a/fo... | paganel wrote: | There's nothing inherently funny about entries like this one | [1], I mean, there is, as in it is sort of funny how the AI got | tricked so quickly into doing incest jokes, but I guess that | was not the research team's intended goal. | | [1] | https://old.reddit.com/r/SubSimulatorGPT2/comments/f1ifp6/my... | RicardoLuis0 wrote: | if you look closer into the usernames, each "bot" is trained | on a specific subreddit, and when taking the subreddit | context into account, for this one post in particular, | "r/twosentencehorror", i'd say it isn't half-bad. | istjohn wrote: | Simulating r/StonerPhilosophy[1] | | Post: Do we live in a simulation? | | Comment: I just realized, we are a simulation, and we are a | simulated simulation. | | Comment: We're all in a simulation. We're still here. We're all | in this little ball together | | Comment: The simulation hypothesis states that we are in a | simulation. Which means that there is a possibility that we are | not in a simulation. | | [1] | https://www.reddit.com/r/SubSimulatorGPT2/comments/ez6qtj/do... | SkyBelow wrote: | My favorite so far is the science bot. 75% of the top level | posts are explaining why the submission was removed. | minimaxir wrote: | GPT-2 X 10 is misleading; this model size is 10x, sure, but | that doesn't _necessarily_ mean the output will be 10x better. | For r /SubSimulatorGPT2, it did go from 355M to 1.5B recently, | but the quality isn't necessarily 4x (although it did improve). | | I'm more interested in _shrinking_ models that maintain the | same level of generative robustness (e.g. distillation, with | distilGPT2) | speedgoose wrote: | It's definitely better than the original using Markov chains. | It fits very well this use case, and in my opinion only this | use case. | | GPT2 is still very random and quite stupid. | | You start it with your love for your girlfriend as a context, | she becomes a cam girl into hard core anal two paragraphs | later. You start with religion, "Muslims must be exterminated". | You start with software and you get a description of non | existent hardware with instructions about how to setup a VPN in | the middle. You start with news, and you can read than China | supports the Islamic state. | | That's cool because it has more context than Markov chains | which usually have only 3 words of context, but it's still a | long way to go before I trust anything generated by this kind | of algorithm. | dilap wrote: | This stuff is pretty much indistinguishable from the real | thing... | | https://www.reddit.com/r/SubSimulatorGPT2/comments/f1pypf/so. | .. | undoware wrote: | As a trans woman I can't help but notice that you rank me | being your girlfriend at about the same level of beyond-the- | pale-ness as a genocide. | | The feeling is mutual | speedgoose wrote: | Sorry I didn't think one could interpret it like this, this | was inappropriate indeed. I edited. | danharaj wrote: | If your model has 17 billion parameters, you missed some. ___________________________________________________________________ (page generated 2020-02-10 23:00 UTC)