[HN Gopher] Announcing GPT-NeoX-20B ___________________________________________________________________ Announcing GPT-NeoX-20B Author : jscob Score : 138 points Date : 2022-02-02 16:03 UTC (6 hours ago) (HTM) web link (blog.eleuther.ai) (TXT) w3m dump (blog.eleuther.ai) | fpgaminer wrote: | So excited for this release. In the wake of AI Dungeon's | downfall, having GPT-Neo to fallback on has been a saving grace. | While the 6B model is nowhere near as good as the original AI | Dungeon, which used OpenAI's 175B model, it was at least | serviceable unlike the "gentled" AI Dungeon. And you could run it | locally or through Colab, which was really cool. I ended up using | it through NovelAI, since they've spent a lot of time fine-tuning | the model and adding a plethora of features that end up improving | the overall output. (NovelAI's interface is like AI Dungeon on | steroids!) But there is a vibrant community of Colab notebooks | and other tools for DIYers surrounding the GPT-Neo model. | | That said, besides being overall "dumber" than 175B GPT-3, the 6B | model was missing a critical feature: prompting. 175B GPT-3 could | be "prompted" to write things. For example, you could give it | "Write a story about cyberpunk gnomes:" and it would go on to do | just that, all on its own. GPT-Neo didn't really have that | capability in my experience. The only way to get it to reliably | write such a story is to begin writing it yourself, at which | point GPT-Neo could help to continue the story. | | So I'm excited to see not just how much "smarter" Eleuther's new | 20B model is, but also if it has attained that coveted prompting | ability. Given the non-linear relationship between parameters and | loss, my hopes are high. | | P.S. NovelAI recently added the Fairseq 13B model to their | repertoire. I haven't had a chance to try it personally, but I've | seen positive things about it. My bet is on GPT-NeoX-20B being | better still. | qlm wrote: | What was AI Dungeon's downfall? Can't find much about it. | minimaxir wrote: | tl;dr AI Dungeon was required to add additional content | filters after it went too off the rails, which caused | community backlash. | | https://www.wired.com/story/ai-fueled-dungeon-game-got- | much-... | fpgaminer wrote: | It was more than that. They also significantly downgraded | the model. I didn't follow the details, but IIUC Dragon | used the 175B directly initially, then I think they went | down a model size at Open AI's behest. Finally, when Open | AI announced pricing, AI Dungeon had to downgrade the model | further. | | But yes, the content filtering got out of hand too. I was | initially fine with it, as its proposed intention was to | filter out really illegal stuff, like underage content. I | rarely hit the filter. But then they tweaked it at some | point and I was triggering it constantly on otherwise | benign stuff. | | And they broke features constantly. | | When I unsubbed the state of AID was broken features, | micro-transactions, terrible AI model, and a glitchy, | puritanical content filter. | | The plus side is that it made the puny GPT-Neo model look | like a godsend. | causi wrote: | _really illegal stuff, like underage content_ | | Wait, isn't this output just text? How is a text AI | generating illegal content? | NovemberWhiskey wrote: | The content may not be illegal to possess, but if it's | obscene, then it can be illegal to sell it, produce it | with the intention of selling it, transport it, | distribute it, and so on. | capableweb wrote: | Could it really? I was under the impression that unless | you incite someone to commit crimes (or confess to | crimes), the story would be covered under "art" and | therefore protected. It's just text after all. Where does | the limit for "obscene" go? | NovemberWhiskey wrote: | In the U.S., it's called the Miller test: | https://en.wikipedia.org/wiki/Miller_test | capableweb wrote: | Wow, I had no idea, that sounds really bad. The whole | book banning debacle now makes sense and seems legal. | That test seems to me to give way for courts to basically | judge however they want, as all those three criteria are | very subjective. | | Also first time I hear about "patently offensive" and now | I'm laughing. Thanks! | causi wrote: | It's very funny to imagine picking up a romance novel and | making it illegal by scrawling "by the way the girl was | actually 16 the whole time" on the inside of the back | cover. | miohtama wrote: | It's called thoughtcrime | | https://www.quora.com/Do-thought-crimes-exist-in-U-S-law | bitforger wrote: | I believe they're currently using AI21's 178B Jumbo model | for Dragon. Since they're completely off of OpenAI now, | the content filter is much more lax. | benjismith wrote: | The "prompting" ability you're referring to is called | "instruction following", and here are some descriptions of it. | | https://openai.com/blog/instruction-following/ | | I think the differences are more in the training data used, | than in the nature of the model itself. So you could probably | train your own instruction-following model on top of this raw | 20B model. | f38zf5vdt wrote: | You can try the model out for free at goose.ai after making an | account and going to the sandbox section. | gibsonf1 wrote: | How is pattern matching ever inference when there is no reference | to the underlying computational model of what the words mean in | spacetime? | | How is it helpful to see what word might come next when the word | sequence is just based on statistics with no reference at all to | meaning? | ChefboyOG wrote: | Have you ever used any sort of autocomplete? | gibsonf1 wrote: | Yes, and very much like it when quickly selecting from a | scope of valid selections. | | This is not that. It is all A with no I. | drxzcl wrote: | Humans assign a lot of, well, meaning to meaning. It turns out | that you can get a really good score on tasks that | superficially you would think require actual understanding | without programming any of that in. | | Does this mean the neural network has learned about meaning? | Does that mean that it has just gotten really good at faking | it? Does is mean that meaning itself doesn't really exist, and | it's just a shorthand for advanced pattern matching? Does it | matter? | | Honestly, we don't know. But we've been thinking about it for a | very long time. See for example the famous Chinese Room thought | experiment: | | https://en.wikipedia.org/wiki/Chinese_room | gibsonf1 wrote: | Try driving a car around without both conceptual and causal | systems understanding of the world - meaning matters for | survival. | f38zf5vdt wrote: | Right on, they're closing in on "Open"AI's best models. Can this | still be run on a GPU, or does it require a lot more VRAM? | stellaathena wrote: | It can be run on an A40 or A6000, as well as the largest A100s. | But other than that, no. | bm-rf wrote: | You could use Microsoft's DeepSpeed to run the model for | inference on multiple GPUS, see | https://www.deepspeed.ai/tutorials/inference-tutorial/ | djoldman wrote: | How much VRAM does it use during inference? | stellaathena wrote: | ~40 GB with standard optimization. I suspect you can shrink | it down more with some work, but it would require | significant innovation to cram it into the next largest | common chip size (24 GB, unless I'm misremembering) | komuher wrote: | Is 40GB already on float16? | benjismith wrote: | I'm super excited about this! | | I'm on the cusp of releasing a model into production that was | fine-tuned upon your 6B model, and the results are quite | excellent. I'd be very curious to try out the 20B model the next | time we retrain. | | Are there any other differences in this release (number of | layers, number of attention heads, etc) compared with the 6B | model, or does it simply scale-up the number of parameters? | guidovranken wrote: | GPT-NeoX-20B will be publicly downloadable from The Eye on the | 9th of February. | | The Eye as in the-eye.eu? That site has been down for a long | time. | stellaathena wrote: | There is a mirror at https://mystic.the-eye.eu/ that has been | up for a long time. | drusepth wrote: | Thanks for this. When the-eye.eu went down it broke a ton of | my Colab notebooks and it was impossible to find a mirror. | dash2 wrote: | Does anyone know whether the spammy websites that sit at the top | of search engine results are already generated by this kind of | model? | nefitty wrote: | That's a use case. I don't see why anyone would go out of their | way to make intelligible content for spam. Google is so broken | right now that SEO hacks are easy to generate. Not to | overstress the tangent, but without search operstors, I have to | sift through pointless Gitlab/Github/Stackoverflow/Wikipedia | clones all the time. | ChefboyOG wrote: | By and large, no. | | That's not to say that those sites are not generated | programmatically--without a doubt, most of them are--but not by | a cutting edge transformer model. The fact is, generating words | has never been the bottleneck for blackhat SEO types. | Generally, those sites are generating their content through | some kind of scraping, or in rarer cases, paying pennies for | nonsense articles. The page itself is structured for search | (targeted H1s, metadata, etc.) and some kind of private blog | network is used to create a pyramid of backlinks. | trasz wrote: | So, what does it do? | [deleted] | dqpb wrote: | EleutherAI is the real open ai. | btdmaster wrote: | Awesome! Any chance for an online demo (like | https://6b.eleuther.ai/)? | stellaathena wrote: | Coming soon! | schleck8 wrote: | Awesome, thanks for your work Stella & team! | [deleted] | terafo wrote: | The best there is right now is a playground on | https://goose.ai/ | stavros wrote: | Which unfortunately doesn't work properly on Firefox (spaces | are removed). | nefitty wrote: | Thank you to everyone who has worked on this. EleutherAI has | become a touchstone in my mind on what is possible in open data | and code. In creating alternatives to closed gardens they have | shown me new possible paths. I know Linux has done the same for | others. | | Huggingface has also made playing with this stuff super | accessible. They've made me super curious about rust and AI/ML | research which has influenced my personal engineering goals for | the future. I am on your team Roko's Basilisk. | monkeydust wrote: | Shout out to Huggingface. As a business user it has allowed me | to explore use cases around text summarisation very easily and | provided ideas for future work. I clearly need to check out | EleutherAI as well. | rsync wrote: | I came to this thread looking for comments that I would suspect | were machine generated. | | I was not disappointed. | coolspot wrote: | Good bot | nefitty wrote: | Beep beep. That means thank you in my motherboard. ___________________________________________________________________ (page generated 2022-02-02 23:00 UTC)