https://lastweekin.ai/p/gpt-3-is-no-longer-the-only-game Last Week in AILast Week in AI Subscribe * About * Archive * Help * Log in Share this post GPT-3 is No Longer the Only Game in Town lastweekin.ai Copy link Twitter Facebook Email Editorials GPT-3 is No Longer the Only Game in Town GPT-3 was by far the largest AI model of its kind last year. Now? Not so much. Nov 6 1 Welcome to the fourth editorial from Last Week in AI! This is the last of our free editorials, and we hope you will consider subscribing to our substack to get access to future ones and to support us. We'd really appreciate your support, and are offering a steep discount to make it easy for you to help us out: Get 50% off for 1 year Money aside, you can also support us by following us on Twitter, checking out our podcast, and sharing this post. Thanks! --------------------------------------------------------------------- Why GPT-3 Matters | Leo Gaosource TLDR: Organizations face significant challenges in creating a model similar to OpenAI's GPT-3, but nevertheless a half dozen or so models as big or bigger than GPT-3 have been announced over the course of 2021. GPT-3 is no Longer the Only Game in Town It's safe to say that OpenAI's GPT-3 has made a huge impact on the world of AI. Quick recap: GPT-3 is a huge AI model that is really good at many "text-in text-out" tasks (a lengthier explanation can be found here, here, here, or in dozens of other write ups). Since being released last year, it has inspired many researchers and hackers to explore how to use and extend it; the paper that introduced GPT-3 is now cited by more than 2000 papers (that's a LOT), and OpenAI claims more than 300 applications use it. However, the ability of people to build upon GPT-3 was hampered by one major factor: it was not publicly released. Instead, OpenAI opted to commercialize it and only provide access to it via a paid API (although, just this past week it has also become available on Microsoft Azure). This made sense given OpenAI's for profit nature, but went against the common practice of AI researchers releasing AI models for others to build upon. So, since last year multiple organizations have worked towards creating their own version of GPT-3, and as I'll go over in this article at this point roughly half a dozen such gigantic GPT-3 esque models have been developed (though as with GPT-3, not yet publicly released). [https]From OpenAI API Page Creating your own GPT-3 is nontrivial for several reasons. First, the compute power needed. The largest variant of GPT-3 has 175 billion parameters which take up 350GB of space, meaning that dozens of GPUs would be needed just to run it and many more would be needed to train it. For reference, OpenAI has worked with Microsoft to create a supercomputer with 10,000 GPUs and 400 gigabits per second of network connectivity per server. Even with this sort of computer power, such models reportedly take months to train. Then, there is the massive amount of data required, with GPT-3 having been trained on about 45 Terabytes of text data from all over the internet, which translates to 181014683608 english words and many more in other languages (though this came from filtered publicly available datasets), further exacerbating the need for expensive computer power to handle it all. Taken together, these factors mean that GPT-3 could have easily cost 10 or 20 million dollars to train (exact numbers are not available). Previous large (though, not as large as GPT-3) language models such as GPT-2, T5, Megatron-LM, and Turing-NLG were similarly costly and difficult to train. [https]GPT-3 is far larger than previous similar models (source) Nevertheless, it was only a matter of time before GPT-3 was successfully recreated (with some tweaks) by others. Surprisingly, one of the earlier efforts to release results was done by a grassroots effort of volunteers, instead of a company with immense amounts of money like OpenAI. The group in question is EleutherAI, "a grassroots collective of researchers working to open source AI research." They first released a dataset similar to the one OpenAI used to train GPT-3, which they named The Pile. Next came GPT-Neo 1.3B and 2.7B (B meaning billions), smaller scale versions of GPT-3, followed most recently by a 6 billion parameter version called GPT-J-6B. All this, done by volunteers working together over Discord (and some generous donations of credits for cloud computing). [https]The start of EletheurAI's journey (source) Meanwhile, other groups were also working towards their own versions of GPT-3. A group of Chinese researchers from Tsinghua University and BAAI released the Chinese Pretrained Language Model (CPM) about 6 months after GPT-3 came out. This is a 2.6 billion parameter model trained on 100GB of Chinese text, still far from the scale of GPT-3 but certainly a step towards it. Notably, GPT-3 was primarily trained on English data, so this represented a model more fitted for use in China. Soon after, researchers at Huawei announced the 200 billion parameter PanGu-a, which was trained on 1.1 terabytes of Chinese text. And so it went on: South Korean company Naver released the 204 billion parameter model HyperCLOVA, Israeli company AI21 Labs released the 178 billion parameter model Jurassic-1, and most recently NVIDIA and Microsoft teamed up to create the 530 billion parameter model Megatron-Turing NLG. These increases in size do not necessarily make these models better than GPT-3, given various aspects affect performance and notable improvements may not be seen until a model has an order of magnitude more parameters. Nevertheless, the trend is clear: more and more massive models similar in nature to GPT-3 are getting created, and they are only likely to grow bigger in the coming years. [https]The size of language model is growing at an exponential rate ( source) This trend of massive investments of dozens of millions of dollars going into training ever more massive AI models appears to be here to stay, at least for now. Given these models are incredibly powerful this is very exciting, but the fact that primarily corporations with large monetary resources can create these models is worrying, and in general there are many implications to this trend. So much so that earlier this year a large number of AI researchers at Stanford worked together to release the paper On the Opportunities and Risks of Foundation Models, which gave GPT-3 and other massive models of its kind the name Foundation Models and presented a detailed analysis of their possibilities and implications. So, this is a big deal, and developments are happening faster and faster. It's hard to say how long this trend of scaling up language models can go on for and whether any major discoveries beyond those of GPT-3 will get made, but for now we are still very much in the middle of this journey, and it's very interesting to see what happens in the coming years. Share About the Author: Andrey Kurenkov (@andrey_kurenkov) is a PhD student with the Stanford Vision and Learning Lab working on learning techniques for robotic manipulation and search. He is advised by Silvio Savarese and Jeannette Bohg. 1 Subscribe - Previous [https] [ ] Create your profile [ ] Your name[ ]Your bio[ ] [ ][ ] Subscribe to the newsletter Save & Post Comment Only paid subscribers can comment on this post Subscribe Already a paid subscriber? Log in Check your email For your security, we need to re-authenticate you. Click the link we sent to , or click here to log in. Bob Foster Writes Down With Bob *23 min ago [https] Golly, that's a big number. I'm sure you meant 181 billion. Expand full comment Reply TopNewCommunityWhat is Last Week in AI?About No posts Ready for more? [ ]Subscribe (c) 2021 Skynet Today. See privacy, terms and information collection notice Publish on Substack Last Week in AI is on Substack - the place for independent writing This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts