https://lastweekin.ai/p/gpt-3-is-no-longer-the-only-game

Last Week in AILast Week in AI
Subscribe

  * About
  * Archive
  * Help
  * Log in

Share this post
GPT-3 is No Longer the Only Game in Town
lastweekin.ai
Copy link
Twitter
Facebook
Email
Editorials

GPT-3 is No Longer the Only Game in Town

GPT-3 was by far the largest AI model of its kind last year. Now? Not
so much.

Nov 6    1         

Welcome to the fourth editorial from Last Week in AI!

This is the last of our free editorials, and we hope you will
consider subscribing to our substack to get access to future ones and
to support us. We'd really appreciate your support, and are offering
a steep discount to make it easy for you to help us out:

Get 50% off for 1 year

Money aside, you can also support us by following us on Twitter,
checking out our podcast, and sharing this post. Thanks!

---------------------------------------------------------------------
Why GPT-3 Matters | Leo Gaosource

TLDR: Organizations face significant challenges in creating a model
similar to OpenAI's GPT-3, but nevertheless a half dozen or so models
as big or bigger than GPT-3 have been announced over the course of
2021.

GPT-3 is no Longer the Only Game in Town

It's safe to say that OpenAI's GPT-3 has made a huge impact on the
world of AI. Quick recap: GPT-3 is a huge AI model that is really
good at many "text-in text-out" tasks (a lengthier explanation can be
found here, here, here, or in dozens of other write ups). Since being
released last year, it has inspired many researchers and hackers to
explore how to use and extend it; the paper that introduced GPT-3 is
now cited by more than 2000 papers (that's a LOT), and OpenAI claims
more than 300 applications use it. 

However, the ability of people to build upon GPT-3 was hampered by
one major factor: it was not publicly released. Instead, OpenAI opted
to commercialize it and only provide access to it via a paid API
(although, just this past week it has also become available on
Microsoft Azure). This made sense given OpenAI's for profit nature,
but went against the common practice of AI researchers releasing AI
models for others to build upon. So, since last year multiple
organizations have worked towards creating their own version of
GPT-3, and as I'll go over in this article at this point roughly half
a dozen such gigantic GPT-3 esque models have been developed (though
as with GPT-3, not yet publicly released).

[https]From OpenAI API Page

Creating your own GPT-3 is nontrivial for several reasons. First, the
compute power needed. The largest variant of GPT-3 has 175 billion
parameters which take up 350GB of space, meaning that dozens of GPUs
would be needed just to run it and many more would be needed to train
it. For reference, OpenAI has worked with Microsoft to create a
supercomputer with 10,000 GPUs and 400 gigabits per second of network
connectivity per server. Even with this sort of computer power, such
models reportedly take months to train. Then, there is the massive
amount of data required, with GPT-3 having been trained on about 45
Terabytes of text data from all over the internet, which translates
to 181014683608 english words and many more in other languages
(though this came from filtered publicly available datasets), further
exacerbating the need for expensive computer power to handle it all. 

Taken together, these factors mean that GPT-3 could have easily cost
10 or 20 million dollars to train (exact numbers are not available).
Previous large (though, not as large as GPT-3) language models such
as GPT-2, T5, Megatron-LM, and Turing-NLG were similarly costly and
difficult to train.

[https]GPT-3 is far larger than previous similar models (source)

Nevertheless, it was only a matter of time before GPT-3 was
successfully recreated (with some tweaks) by others. Surprisingly,
one of the earlier efforts to release results was done by a
grassroots effort of volunteers, instead of a company with immense
amounts of money like OpenAI. The group in question is EleutherAI, "a
grassroots collective of researchers working to open source AI
research." They first released a dataset similar to the one OpenAI
used to train GPT-3, which they named The Pile. Next came GPT-Neo
1.3B and 2.7B (B meaning billions), smaller scale versions of GPT-3,
followed most recently by a 6 billion parameter version called
GPT-J-6B. All this, done by volunteers working together over Discord
(and some generous donations of credits for cloud computing).

[https]The start of EletheurAI's journey (source)

Meanwhile, other groups were also working towards their own versions
of GPT-3. A group of Chinese researchers from Tsinghua University and
BAAI released the Chinese Pretrained Language Model (CPM) about 6
months after GPT-3 came out. This is a 2.6 billion parameter model
trained on 100GB of Chinese text, still far from the scale of GPT-3
but certainly a step towards it. Notably, GPT-3 was primarily trained
on English data, so this represented a model more fitted for use in
China. Soon after, researchers at Huawei announced the 200 billion
parameter PanGu-a, which was trained on 1.1 terabytes of Chinese
text. 

And so it went on: South Korean company Naver released the 204
billion parameter model HyperCLOVA, Israeli company AI21 Labs
released the 178 billion parameter model Jurassic-1, and most
recently NVIDIA and Microsoft teamed up to create the 530 billion
parameter model Megatron-Turing NLG. These increases in size do not
necessarily make these models better than GPT-3, given various
aspects affect performance and notable improvements may not be seen
until a model has an order of magnitude more parameters.
Nevertheless, the trend is clear: more and more massive models
similar in nature to GPT-3 are getting created, and they are only
likely to grow bigger in the coming years. 

[https]The size of language model is growing at an exponential rate (
source)

This trend of massive investments of dozens of millions of dollars
going into training ever more massive AI models appears to be here to
stay, at least for now. Given these models are incredibly powerful
this is very exciting, but the fact that primarily corporations with
large monetary resources can create these models is worrying, and in
general there are many implications to this trend. So much so that
earlier this year a large number of AI researchers at Stanford worked
together to release the paper On the Opportunities and Risks of
Foundation Models, which gave GPT-3 and other massive models of its
kind the name Foundation Models and presented a detailed analysis of
their possibilities and implications. 

So, this is a big deal, and developments are happening faster and
faster. It's hard to say how long this trend of scaling up language
models can go on for and whether any major discoveries beyond those
of GPT-3 will get made, but for now we are still very much in the
middle of this journey, and it's very interesting to see what happens
in the coming years.

Share

About the Author:

Andrey Kurenkov (@andrey_kurenkov) is a PhD student with the Stanford
Vision and Learning Lab working on learning techniques for robotic
manipulation and search. He is advised by Silvio Savarese and
Jeannette Bohg.

   1          
Subscribe
- Previous
[https]
[                    ]
Create your profile

[                    ]
Your name[                    ]Your bio[                    ]
[                    ][ ] Subscribe to the newsletter
Save & Post Comment
Only paid subscribers can comment on this post

Subscribe
Already a paid subscriber? Log in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to log in.

        Bob Foster
        Writes Down With Bob *23 min ago
 
[https] Golly, that's a big number. I'm sure you meant 181 billion.

        Expand full comment
         Reply

TopNewCommunityWhat is Last Week in AI?About 

No posts

Ready for more?

[                    ]Subscribe
(c) 2021 Skynet Today. See privacy, terms and information collection
notice
 Publish on Substack
Last Week in AI is on Substack - the place for independent writing
This site requires JavaScript to run correctly. Please turn on
JavaScript or unblock scripts