[HN Gopher] Announcing GPT-NeoX-20B
       ___________________________________________________________________
        
       Announcing GPT-NeoX-20B
        
       Author : jscob
       Score  : 138 points
       Date   : 2022-02-02 16:03 UTC (6 hours ago)
        
 (HTM) web link (blog.eleuther.ai)
 (TXT) w3m dump (blog.eleuther.ai)
        
       | fpgaminer wrote:
       | So excited for this release. In the wake of AI Dungeon's
       | downfall, having GPT-Neo to fallback on has been a saving grace.
       | While the 6B model is nowhere near as good as the original AI
       | Dungeon, which used OpenAI's 175B model, it was at least
       | serviceable unlike the "gentled" AI Dungeon. And you could run it
       | locally or through Colab, which was really cool. I ended up using
       | it through NovelAI, since they've spent a lot of time fine-tuning
       | the model and adding a plethora of features that end up improving
       | the overall output. (NovelAI's interface is like AI Dungeon on
       | steroids!) But there is a vibrant community of Colab notebooks
       | and other tools for DIYers surrounding the GPT-Neo model.
       | 
       | That said, besides being overall "dumber" than 175B GPT-3, the 6B
       | model was missing a critical feature: prompting. 175B GPT-3 could
       | be "prompted" to write things. For example, you could give it
       | "Write a story about cyberpunk gnomes:" and it would go on to do
       | just that, all on its own. GPT-Neo didn't really have that
       | capability in my experience. The only way to get it to reliably
       | write such a story is to begin writing it yourself, at which
       | point GPT-Neo could help to continue the story.
       | 
       | So I'm excited to see not just how much "smarter" Eleuther's new
       | 20B model is, but also if it has attained that coveted prompting
       | ability. Given the non-linear relationship between parameters and
       | loss, my hopes are high.
       | 
       | P.S. NovelAI recently added the Fairseq 13B model to their
       | repertoire. I haven't had a chance to try it personally, but I've
       | seen positive things about it. My bet is on GPT-NeoX-20B being
       | better still.
        
         | qlm wrote:
         | What was AI Dungeon's downfall? Can't find much about it.
        
           | minimaxir wrote:
           | tl;dr AI Dungeon was required to add additional content
           | filters after it went too off the rails, which caused
           | community backlash.
           | 
           | https://www.wired.com/story/ai-fueled-dungeon-game-got-
           | much-...
        
             | fpgaminer wrote:
             | It was more than that. They also significantly downgraded
             | the model. I didn't follow the details, but IIUC Dragon
             | used the 175B directly initially, then I think they went
             | down a model size at Open AI's behest. Finally, when Open
             | AI announced pricing, AI Dungeon had to downgrade the model
             | further.
             | 
             | But yes, the content filtering got out of hand too. I was
             | initially fine with it, as its proposed intention was to
             | filter out really illegal stuff, like underage content. I
             | rarely hit the filter. But then they tweaked it at some
             | point and I was triggering it constantly on otherwise
             | benign stuff.
             | 
             | And they broke features constantly.
             | 
             | When I unsubbed the state of AID was broken features,
             | micro-transactions, terrible AI model, and a glitchy,
             | puritanical content filter.
             | 
             | The plus side is that it made the puny GPT-Neo model look
             | like a godsend.
        
               | causi wrote:
               | _really illegal stuff, like underage content_
               | 
               | Wait, isn't this output just text? How is a text AI
               | generating illegal content?
        
               | NovemberWhiskey wrote:
               | The content may not be illegal to possess, but if it's
               | obscene, then it can be illegal to sell it, produce it
               | with the intention of selling it, transport it,
               | distribute it, and so on.
        
               | capableweb wrote:
               | Could it really? I was under the impression that unless
               | you incite someone to commit crimes (or confess to
               | crimes), the story would be covered under "art" and
               | therefore protected. It's just text after all. Where does
               | the limit for "obscene" go?
        
               | NovemberWhiskey wrote:
               | In the U.S., it's called the Miller test:
               | https://en.wikipedia.org/wiki/Miller_test
        
               | capableweb wrote:
               | Wow, I had no idea, that sounds really bad. The whole
               | book banning debacle now makes sense and seems legal.
               | That test seems to me to give way for courts to basically
               | judge however they want, as all those three criteria are
               | very subjective.
               | 
               | Also first time I hear about "patently offensive" and now
               | I'm laughing. Thanks!
        
               | causi wrote:
               | It's very funny to imagine picking up a romance novel and
               | making it illegal by scrawling "by the way the girl was
               | actually 16 the whole time" on the inside of the back
               | cover.
        
               | miohtama wrote:
               | It's called thoughtcrime
               | 
               | https://www.quora.com/Do-thought-crimes-exist-in-U-S-law
        
               | bitforger wrote:
               | I believe they're currently using AI21's 178B Jumbo model
               | for Dragon. Since they're completely off of OpenAI now,
               | the content filter is much more lax.
        
         | benjismith wrote:
         | The "prompting" ability you're referring to is called
         | "instruction following", and here are some descriptions of it.
         | 
         | https://openai.com/blog/instruction-following/
         | 
         | I think the differences are more in the training data used,
         | than in the nature of the model itself. So you could probably
         | train your own instruction-following model on top of this raw
         | 20B model.
        
         | f38zf5vdt wrote:
         | You can try the model out for free at goose.ai after making an
         | account and going to the sandbox section.
        
       | gibsonf1 wrote:
       | How is pattern matching ever inference when there is no reference
       | to the underlying computational model of what the words mean in
       | spacetime?
       | 
       | How is it helpful to see what word might come next when the word
       | sequence is just based on statistics with no reference at all to
       | meaning?
        
         | ChefboyOG wrote:
         | Have you ever used any sort of autocomplete?
        
           | gibsonf1 wrote:
           | Yes, and very much like it when quickly selecting from a
           | scope of valid selections.
           | 
           | This is not that. It is all A with no I.
        
         | drxzcl wrote:
         | Humans assign a lot of, well, meaning to meaning. It turns out
         | that you can get a really good score on tasks that
         | superficially you would think require actual understanding
         | without programming any of that in.
         | 
         | Does this mean the neural network has learned about meaning?
         | Does that mean that it has just gotten really good at faking
         | it? Does is mean that meaning itself doesn't really exist, and
         | it's just a shorthand for advanced pattern matching? Does it
         | matter?
         | 
         | Honestly, we don't know. But we've been thinking about it for a
         | very long time. See for example the famous Chinese Room thought
         | experiment:
         | 
         | https://en.wikipedia.org/wiki/Chinese_room
        
           | gibsonf1 wrote:
           | Try driving a car around without both conceptual and causal
           | systems understanding of the world - meaning matters for
           | survival.
        
       | f38zf5vdt wrote:
       | Right on, they're closing in on "Open"AI's best models. Can this
       | still be run on a GPU, or does it require a lot more VRAM?
        
         | stellaathena wrote:
         | It can be run on an A40 or A6000, as well as the largest A100s.
         | But other than that, no.
        
           | bm-rf wrote:
           | You could use Microsoft's DeepSpeed to run the model for
           | inference on multiple GPUS, see
           | https://www.deepspeed.ai/tutorials/inference-tutorial/
        
           | djoldman wrote:
           | How much VRAM does it use during inference?
        
             | stellaathena wrote:
             | ~40 GB with standard optimization. I suspect you can shrink
             | it down more with some work, but it would require
             | significant innovation to cram it into the next largest
             | common chip size (24 GB, unless I'm misremembering)
        
               | komuher wrote:
               | Is 40GB already on float16?
        
       | benjismith wrote:
       | I'm super excited about this!
       | 
       | I'm on the cusp of releasing a model into production that was
       | fine-tuned upon your 6B model, and the results are quite
       | excellent. I'd be very curious to try out the 20B model the next
       | time we retrain.
       | 
       | Are there any other differences in this release (number of
       | layers, number of attention heads, etc) compared with the 6B
       | model, or does it simply scale-up the number of parameters?
        
       | guidovranken wrote:
       | GPT-NeoX-20B will be publicly downloadable from The Eye on the
       | 9th of February.
       | 
       | The Eye as in the-eye.eu? That site has been down for a long
       | time.
        
         | stellaathena wrote:
         | There is a mirror at https://mystic.the-eye.eu/ that has been
         | up for a long time.
        
           | drusepth wrote:
           | Thanks for this. When the-eye.eu went down it broke a ton of
           | my Colab notebooks and it was impossible to find a mirror.
        
       | dash2 wrote:
       | Does anyone know whether the spammy websites that sit at the top
       | of search engine results are already generated by this kind of
       | model?
        
         | nefitty wrote:
         | That's a use case. I don't see why anyone would go out of their
         | way to make intelligible content for spam. Google is so broken
         | right now that SEO hacks are easy to generate. Not to
         | overstress the tangent, but without search operstors, I have to
         | sift through pointless Gitlab/Github/Stackoverflow/Wikipedia
         | clones all the time.
        
         | ChefboyOG wrote:
         | By and large, no.
         | 
         | That's not to say that those sites are not generated
         | programmatically--without a doubt, most of them are--but not by
         | a cutting edge transformer model. The fact is, generating words
         | has never been the bottleneck for blackhat SEO types.
         | Generally, those sites are generating their content through
         | some kind of scraping, or in rarer cases, paying pennies for
         | nonsense articles. The page itself is structured for search
         | (targeted H1s, metadata, etc.) and some kind of private blog
         | network is used to create a pyramid of backlinks.
        
       | trasz wrote:
       | So, what does it do?
        
       | [deleted]
        
       | dqpb wrote:
       | EleutherAI is the real open ai.
        
       | btdmaster wrote:
       | Awesome! Any chance for an online demo (like
       | https://6b.eleuther.ai/)?
        
         | stellaathena wrote:
         | Coming soon!
        
           | schleck8 wrote:
           | Awesome, thanks for your work Stella & team!
        
         | [deleted]
        
         | terafo wrote:
         | The best there is right now is a playground on
         | https://goose.ai/
        
           | stavros wrote:
           | Which unfortunately doesn't work properly on Firefox (spaces
           | are removed).
        
       | nefitty wrote:
       | Thank you to everyone who has worked on this. EleutherAI has
       | become a touchstone in my mind on what is possible in open data
       | and code. In creating alternatives to closed gardens they have
       | shown me new possible paths. I know Linux has done the same for
       | others.
       | 
       | Huggingface has also made playing with this stuff super
       | accessible. They've made me super curious about rust and AI/ML
       | research which has influenced my personal engineering goals for
       | the future. I am on your team Roko's Basilisk.
        
         | monkeydust wrote:
         | Shout out to Huggingface. As a business user it has allowed me
         | to explore use cases around text summarisation very easily and
         | provided ideas for future work. I clearly need to check out
         | EleutherAI as well.
        
         | rsync wrote:
         | I came to this thread looking for comments that I would suspect
         | were machine generated.
         | 
         | I was not disappointed.
        
           | coolspot wrote:
           | Good bot
        
             | nefitty wrote:
             | Beep beep. That means thank you in my motherboard.
        
       ___________________________________________________________________
       (page generated 2022-02-02 23:00 UTC)