[HN Gopher] AI Democratization in the Era of GPT-3 ___________________________________________________________________ AI Democratization in the Era of GPT-3 Author : jonbaer Score : 66 points Date : 2020-09-26 17:02 UTC (5 hours ago) (HTM) web link (thegradient.pub) (TXT) w3m dump (thegradient.pub) | [deleted] | data_ders wrote: | I do agree that it isn't "democratization" in the sense of | "having access" and that it's a buzz word. But with Azure | Cognitive Services and Azure AutoML, I do think MSFT has shown | itself capable of point 3: enabling its customers "to use the | algorithms and models, potentially without requiring advanced | mathematical and computing science skills". | skybrian wrote: | It's certainly annoying not to have access to the API to try a | few things. However, optimization has just begun and it seems | like there will probably be competing models 100x smaller in a | year or two? | m1rr0rman wrote: | RussianNationalAnthem.mp3 | Fordec wrote: | I applied for API access months ago and still haven't been able | to tinker with it. | | Until it's immediate sign up and not some insider walled garden | waitlist, from a developer perspective, OpenAI and GPT-3 is | anything but Open. | | If I was overly pushed to get access, my best chance of success | right now appears to be a search on Github for somebody elses | leaked API key in a repo somewhere. | mensetmanusman wrote: | I learned some things. | | GTP-3 is so complex the model requires large cloud computing | resources to run. Ergo, it is also very expensive to run. | | Assumption: Bleeding edge AI will require tens of millions of | dollars of computation before new network architectures fall out | of state space. After this, the models can be pruned to be ran by | mere mortals. | | If this is true, OpenAI will not be able to move to the next | level without partnering with very wealthy institutions a la | Microsoft. | | If this is true, those calling for OpenAI to not monetize | intermittent progress are essentially preventing next generation | discovery, unless they have alternative monetization ideas to | generate 8 figures for research. | anoncareer0212 wrote: | Given the worst-case assumptions of people arguing the cloud is | required[1], it's a one-time investment of $60,000, $200,000 in | the absolute worst case. | | [1] https://news.ycombinator.com/item?id=24601264 | exged wrote: | That's for model inference. For training OpenAI said they | used around 3000 PetaFLOPS / days on the largest GPT-3 model. | That translates to about 300 Nvidia A100 GPUs if you want | training to finish in a month (any slower and your | researchers are not going to be able to make much progress). | A system like that would cost at least $5M, probably more | like $10M. | 2sk21 wrote: | Surely $10 million should be well within the spending | abilities of a fair number of tech people? Many | universities now have fairly large computing clusters as | well. | lopmotr wrote: | Is that unit PetaFLOPS/day right? I think it should be | PetaFLOPS-day which has dimensions of FLOP (total number of | operations), rather than FLOP/time^2. | | The cost wouldn't be the cost of the hardware because it | still exists afterwards. You'd have to discount it for the | amount of time it was in use. | choppaface wrote: | There will always be models and research that require | proprietary hardware and resources and thus cannot be | "democratized." PageRank required a ton of hardware and an | index. Non-industry researchers will always be priced out of | something. | | A key concern here with regards to the ethics-of-AI issue is | that last year OpenAI refused to release GPT-2 because it was | too 'dangerous'. This year, GPT-3 is suddenly a revenue- | generating Microsoft product. The Gradient article linked is | one of the more diplomatic ways of calling BS on OpenAI's | strategy. Financial interests taint research in subtle ways, it | has for decades, and OpenAI employees being paid $1m in cash | salary need reminders of this fact. | | https://openai.com/blog/better-language-models/ | 29athrowaway wrote: | Powerful models are trained using large amounts of data that most | people do not have access to. | | They are also trained powerful infrastructure that most people do | not have access to. | | So, to speak of democratization is interpreting the current state | of affairs incorrectly. | boulos wrote: | See my sibling comment, but the dataset that the GPT-2 folks | did was "just grab whatever people on reddit upvoted". Any grad | student might have done the same (and in a sense, James Hays | and Alyosha Efros did something like that long ago with flickr | data for im2gps). | | While GPT-3 is trained on a massive V100 cluster, you could | probably do so with a much smaller one / there exist | interesting smaller models. It's _expensive_ to rent this class | of equipment, but it is available. | | The distinction is that OpenAI made a focused bet. Most | research funding and labs spread their bets heavily (e.g., each | institution or researcher gets $50k/yr of funding). OpenAI | takes a different stance, but it's not clear that they're | spending even as much as say Google, Facebook or other large | institutions. It's also not obvious that you even have to play | the same game to get similar results. | mirekrusin wrote: | They trained it on freely available datasets managed by | nonprofits as somebody mentioned. | | Powerfull infrastructure can be democratized in similar way | that SETI@home, cryptocurriencies and other P2P projects do it. | | In theory this can be done. | jackcosgrove wrote: | The PC was supposed to democratize computing. It did, and then | Microsoft found a choke point. The internet was supposed to | democratize communication. It did, and then Google found a choke | point. | | It seems like this two steps forward, one step back pattern might | be the rule rather than the exception. Even the article defines | AI democratization in terms of using models rather than training | models, as the costs of training sophisticated models seems | beyond the common developer even according to idealists. | | AI in particular seems to be a centralizing technology at this | point, given that a model is often a black box to the user. The | amount of data as well requires a massive telemetry apparatus, | and designing likely models seems the province of people with | PhDs. | | So yes I'm a bit pessimistic that AI will have a democratizing | effect on technology or society, at least in the near future. | r7r73hdhdu wrote: | This seems like the most cynical take possible, most of those | choke points aren't really that firm anymore. Microsoft had a | brief chokehold and then smartphones became the computer of the | working class. Google is dominant in search but now they're | competing against other megacorps who are investing heavily in | catching up. All that's died is the naivety of the 80s and 90s | that believed small mom and pops would somehow be outcompeting | megacorps rather than serving as product development for them. | ssss11 wrote: | Isnt this just human nature? Not everyone is looking to improve | the world and when they get power.. well, power corrupts. I'm | equally pessimistic, not about new technologies but about who | controls them - we need to empower societies not corporations. | srtjstjsj wrote: | > The PC was supposed to democratize computing | | It was? | DSingularity wrote: | Yes. Before that it was mainframes and those who had access | to $ who had access to computing. | nl wrote: | Yes. The name gives it away. | | And Microsoft's mission for many years was "a computer in | every home". | decasteve wrote: | The airplane and the radio brought people closer. It also | improved the coordination and delivery of mass destruction. | | Technology can be a rollercoaster ride of benefit and | detriment. | tobr wrote: | Technology is an amplifier of things we're already doing. | It's easy to think of all the amazing things people are | capable of, and imagine that new technology will help those | amazing things flourish. | | But people are also capable of unspeakable cruelty. Depending | on your outlook on humanity, that insight may or may not give | you sympathy for neo-Luddites. | ben_w wrote: | How else would you define "AI democratization"? The fact I can | even program puts me in an elite of less than 0.5% of the world | population [0], and that includes all specialties not just AI. | | I do iPhone apps these days; even though I can follow the | various tutorials for how to train an AI to recognise | handwritten digits [1], I don't _actually_ grok the maths | behind the back-propagation algorithm and why it's better than, | say, simulated annealing of the weights (which I do grok). | | True democracy puts the power in the hands of the masses; | making it available to all developers _and only developers_ is | like giving the vote to only the richest half of all | millionaires. Packaging AI into a simple magic black box makes | it available to everyone. | | I don't know if normal people have the right expectations of | the tech for that to be generally wise, but that is a separate | problem and black boxes are the only way I can see to achieve | the goal. | | [0] https://www.daxx.com/blog/development-trends/number- | software... | | [1] https://kitsunesoftware.wordpress.com/2018/03/16/speed-of- | ma... | cblconfederate wrote: | I don't see where is the monetization potential of GPT3. Creating | more spam? Making poetry in bulk? Being SOTA in language tasks | doesnt make it useful per se, as NLP measures are rather abstract | anyway. As for its technology, it's huge but unless they are | hiding some secret sauce, it's a straigtforwardly transformer- | based, which means someone somewhere with cheap electricity is | already training it on a dump of the entire web. Where's the moat | here? Perhaps in some future version that they 'll keep secret? | Well it will be a sad day for humanity if they keep its | development in secret, given how much its development has | benefited from open science around the world. | angel_j wrote: | Forget AI _democritization_ , GPT-3 is AI _demoguerization_. | | GPT-3 is singular; it is one model, one dataset, one training. | Yet it will be the only one that will exist for quite some time | (or by far the most available), and now it will underwrite | _productization_ and malfeasance, a la mode pay to play. | | For example, I recently read a paper supposedly written by a | Chinese dissident virologist, which report was disseminated by a | group with questionable membership. Most of the jargon in the | report going over my head, I had to wonder if the otherwise | convincing verbiage wasn't the work of GPT-3. | chime wrote: | Not saying I am a fan of everything going on in tech world but it | is pretty evident that AI is going to end up happening the West | World "Rehoboam" big AI brain way than the Jetsons AI maid way, | not purely due to corporate interests but rather due to the | inherent cost of just making it work. Unless we can have PB/EB | worth of storage locally then it's going to be the cloud. And a | cloud API with 1000 EB storage will always outperform your local | 10 PB AI model. | | The most practical approach that I can see would be to minimize | the cost of accessing the model, making it free like author | suggests for research, students, non-profits etc. and charging | more for commercial usage, basically extending the existing cloud | model. | | One thing that I think should be free for everyone would be | testing the model for biases. Basically if every AI API to check | bias on any topic was free, then it could be improved by anyone | including marginalized groups. If GPT-3 thinks all Indians are | either doctors, coders, or gas station owners then I would like | to be able to test, verify, and offer a patch without any cost, | maybe even a reward. Otherwise GPT-4-5-6 will end up throwing | away all Indian sounding last names applying for a construction | job. | andreyk wrote: | Curious as to what people think wrt the final question " Do we | trust [OpenAI] to take on this role [of deciding who can have | access to newer SOTA models and who can't]? And if not, how can | academics and practitioners fight for the continued | democratization of AI, as some of its most important techniques | become as hard to replicate as GPT-3?" | | It seems like for super gigantic models, good infra to train on | cloud deployment (for which many labs can have credits / grants) | would be the first priority, followed perhaps by good | compression/pruning tech so after training inference can be done | on one/fewer GPUs. Any other things? | visarga wrote: | Maybe the authors of the text should also have a stake in | GPT-3. After all, OpenAI didn't write the corpus. Google | benefits from the web, and Facebook from the real-life social | networks and their activities that it replicates online | (messages, meetings, news, etc). We are all the source of the | training data. Why should we be at the whims of these | derivative product companies? | andreyk wrote: | Fun fact, OpenAI did not collect the data itself - they | mainly used data from Common Crawl (in addition to a couple | other datasets), which is compiled by a non profit that | shares the dataset for free. So perhaps the license of such | datasets can encourage free sharing of research outcomes. | | https://commoncrawl.org/ | searchableguy wrote: | Shouldn't I own part of gpt3 because it's trained on my | data and likely spitting out what I have commented | somewhere on the internet? | | I think ml models should be public unless the data itself | isn't. | searchableguy wrote: | Curious about the downvotes. If you remove compute and | code, what remains is data that isn't owned by a single | entity in the case of gpt3. You can sell both compute and | code which is what you own but should you be able to sell | data? | | You can't translate a commerical book without paying the | copyright holder. You own the copyright for the | translation but everything else still remains that of the | original author. Why wouldn't this apply in case of a | machine learning model trained on my images, messages, | intents, likeness, etc? | isx726552 wrote: | But you don't understand, OpenAI has hype and "rockstars" | and big names attached, so those rules don't apply. | | Your data? It's their data now. (And Microsoft's.) | whichquestion wrote: | Would we be able to democratize these large ML models using | something like SETI@Home or similar? Would more research in | distributing models across large networks be useful? Has | research on this topic been done? | sillysaurusx wrote: | I'm amazed no one pointed out that OpenAI will be dead within 5 | years _unless_ it monetizes GPT-3. | | Democratizing AI is a fine goal, but it's secondary to not dying. | At least, for OpenAI it is. And that should be obvious. | nl wrote: | This a bad take. In 2 years time there will be a quantized, | TensorCore optimised equivalent that will run fine on desktop | GPUs. | | We've seen this over and over again and there is no reason to | believe this is different. | ve55 wrote: | OpenAI's goal is definitely _not_ to give everyone unlimited | /equal access to powerful tools like GPT-3. We've had countless | jokes about the name being 'OpenAI', and perhaps it's true that | it's not the best name (along with 'democratizing' AI), but I'm | not sure the author is suggesting a solution here rather than | just venting that things seem kind of unfair, and no one outside | of OpenAI really has much control or information available such | as what he asks about. | | But I personally find the complaints to be understandable, | especially as someone that didn't get a response for my requests | for GPT-3 beta access, it felt pretty bad to watch everyone else | have fun building cool things with the world's best text AI while | I sat there and couldn't do anything, even if I was willing to | pay for access. | | Hopefully there will be other relevant players here besides just | OpenAI sooner or later. | gdb wrote: | (I work at OpenAI.) | | > especially as someone that didn't get a response for my | requests for GPT-3 beta access | | We are still working our way through the beta list -- we've | received tens of thousands of applications and we're trying to | grow responsibly. We will definitely get to you (and everyone | else who applies), but it may take some time. | | We are generally prioritizing people with a specific | application they'd like to build, if you email me directly | (gdb@openai.com) I may be able to accelerate an invite to you. | hanoz wrote: | _> We are generally prioritizing people with a specific | application they 'd like to build._ | | Why? | csande17 wrote: | OpenAI's goals are (1) make money and (2) generate positive | press coverage about OpenAI. (They make statements about | wanting other things but that's mainly to help them achieve | (2).) | | Prioritizing people with concrete project ideas helps them | in both areas: they're more likely to convert into paid | customers down the line, and they're more likely to | generate "OpenAI technology is now being used for X" press | releases. | elpin wrote: | I think there's a fair argument that groups attempting to | make a specific product are more likely to drive platform | development than random individuals who just want to | noodle around. This isn't to say that the more individual | experimenters won't drive development too, just that when | you're dealing with limited resources you do have to make | some decisions about allocation. | | Just framing it in terms of money and "generating | positive press coverage" is a little cynical IMO. Is | prioritizing any cool use cases of their technology that | push the boundaries of today's technology to create real | use cases besides "haha look I can make GPT3 parody VC | Medium/LinkedIn articles" just press optics? I don't | think so but can also understand the concern especially | given this article is about democratization. | ve55 wrote: | Thanks for the response - I had assumed the beta period was | soon coming to an end, so by the time I was able to have | access I'd have to pay just for basic experimentation. It was | hard to say specifically what I'd design since I'd have to | experiment with the API first to see if the ideas I had were | feasible, so I probably did a poor job at that part of the | application, but appreciate the offer! | langitbiru wrote: | "other relevant players" -> Google could create one. | csande17 wrote: | Surely the solution here is "put the model on BitTorrent, you | cowards". | | Like, okay, the model's big and unwieldy to run. But hardware's | always getting better, and there are lots of research use-cases | where it's okay if it takes ten minutes to page the model in | and out of SSD while generating predictions. Plus, maybe we'd | get some more discoveries in the field of efficiently running | huge models. | | The arguments about "safety" were PR nonsense when they were | making them about GPT-2, and they're nonsense now. It's a robot | that blends up Reddit posts in a food processor, it's barely | more advanced than tapping the iPhone predict-next-word button | over and over, it's not going to hack the Pentagon or take over | the world. The only reason OpenAI has ever had to not publish | their models -- and I am ashamed that this industry doesn't | call them out more often on this -- is so that they can | generate positive press coverage on launch day with unrefutable | cherry-picked examples. | boulos wrote: | The author though raises concerns about both the availability | (openness) of the model as well as the current ability to run | it due to cost (equity / equality of access). Making the | model available would still not make it equal access. | | I'm not saying I agree or disagree with the openness | argument, but the equality argument is separate. | claudeganon wrote: | If they released it, people would figure out a way to run | it "equitably" within months, if not weeks. | | The amount of cheap GPU access floating out there is nuts. | You can spin up a GPU instance to do best-in-class ML stuff | using Fast.Ai on services like Paperspace or Colab, right | now, for free. | claudeganon wrote: | It's beyond absurd to build a model with public, user-created | data, gathered and released for free by a nonprofit, and then | claim "uh, the model's too big and unwieldy, so we have to | keep things under lock and key." | | I don't doubt that they'll profit handsomely from this | approach, but it's the height of cynicism to engage in this | kind of stuff and their statements around the practice should | be taken in kind. | ma2rten wrote: | The article is hinting at this but I also think many people who | complain that OpenAI didn't release the model don't understand | how big this model actually is. Even if they had access to the | parameters they couldn't do much with it. | | Assuming you used single precision the model is 350 gigabytes | (175 billion * 2 bytes). For fast inference the model needs to | be in GPU memory. Most GPUs have 16GB of memory, so you would | need 22 GPUs just to hold the model in memory and that doesn't | even include the memory for activations. | | If you wanted to do fine tuning, you would need 3x as much | memory for gradients and momentum. | tarboreus wrote: | If it was open, there would be other services offering this, | and not just an opaque beta and now a single expensive | service. | wslh wrote: | Tell cryptocurrency miners that this is a big model to | compute... the size of this problem seems very tiny. | | If there are millions of ASIC, GPU, etc devices mining | cryptocurrencies it is fair to speculate that democratizing | AI has a special room in this model. | anoncareer0212 wrote: | A one-time investment of $60,000, $200,000 in the worst case, | isn't a way of dismissing the 'many people who complain the | model [wasn't released]', especially given the alternative is | 'being Microsoft', which costs $1,570,000,000,000. | liuliu wrote: | You need tiny bit of memory for activations if you don't want | fine-tuning. I think for GPT-3, fine-tuning is out of window. | But it is reasonable to expect inference takes less than a | minute with single 3090 and fast enough SSD. | ma2rten wrote: | OpenAI offers a fine-tuning api. | | How did come up the one minute estimate? According to a | quick google search I did, the fastest SSDs these days have | a bandwidth of 3100 MB/s. So it would take 112s just to | read the weights. | liuliu wrote: | I don't have access to see whether they have fine-tuning | API. Do you have any links explain the said fine-tuning? | It is certainly surprising given there is no fine-tuning | experiment mentioned in the GPT-3 paper. | | Weights loading is embarrassingly simple to parallelize. | Just use madam with 3 or 4 NVMe SSD sticks are | sufficiently enough. You are more likely bounded by PCIe | bandwidth than the SSD bandwidth. Newer NVIDIA cards with | PCIe-4 support helps. | [deleted] | boulos wrote: | Disclosure: I work on Google Cloud and have worked with the | OpenAI folks on large models. | | This article mixes both "should research be open" and "is this | work cheaply reproduced / accessible": | | For smaller, open models: | | > The average person could not recreate models of this size from | scratch, but the models can run on a single machine with a single | GPU. | | but about GPT-3: | | > GPT-3 represents a new circumstance. For the first time, a | model is so big it cannot be easily moved to another cloud and | certainly does not run on a single computer with a single or | small number of GPUs. Instead OpenAI is providing an API so that | the model can be run on their cloud. | | While I'd quibble with "for the first time" (it's easy to | generate mega models! Plenty of distributed mesh tensorflow stuff | does that, etc.), I don't think this is any different than large | physics simulations. | | Is it "wrong" to have some groups push the boundary of what's | possible with supercomputers? I certainly don't think so. If | anything, it shows what's possible and others can do the valuable | work of "miniaturization". In this specific area, PRADO is a good | example relative to BERT. For my historic area of ray tracing | research, we did lots of things on an SGI Origin that let us | "jump ahead a few years" versus what we could have done on any | basic workstation. | | You could argue that it's not academically interesting ("you just | ran this really big because you had the budget / hardware, | whatever") and reject the paper. [Edit: I consider this kind of | work interesting from a _systems_ perspective, not "ML", but it's | still interesting!] I don't think it makes sense to suggest that | we should hold back progress based on NSF grant funding or least | common denominator computing resources. How would you decide | what's acceptable? Is a single A100 affordable? Only a T4? Only a | laptop? | | tl;dr: it's fine to argue about openness and democratization | being hollow marketing words. I'm not sure I would conflate | openness with "everyone, everywhere should be able to run any | scientific work without expense". | robmsmt wrote: | how big do the models get? are you able to say? | boulos wrote: | As big as you want, kind of. The challenge, as in large-scale | physics, is how many nodes you can stick together with | sufficiently high bandwidth (low latency is less important in | the ML space, because there are lots of ops per byte, unlike | some CFD Simulations that have very few per update). | | On our Cloud TPU product page [1], we have a single TPU v3 | pod with 32 TB of memory. For the most recent MLperf | submission, the TPU folks hooked up four of them [2]. There's | obviously a reduction in scalability from doing so (see weak | scaling versus strong scaling terminology), but that's the | interesting co-design question: what kind of models can you | usefully train in an "even more distributed" mode? | | Outside of TPUs though, even our single 16x A100 offering has | 640 GB all connected by NVLINK (other providers went with 8, | so 320 GB of "system memory") and there are at least a few in | a single rack. So the era of TiB scale models is certainly | "semi feasible" and "open to all". | | The challenge is that you need to also train these for quite | some time. 1000 V100s would cost you at least $2000/hr to | rent. Many models are sufficiently complicated (not just | large) that you end up training them for days and weeks, even | with this much compute. So the numbers add up quickly. | | But just being "big" doesn't mean "trained for a month on a | supercomputer". | | [1] https://cloud.google.com/tpu | | [2] https://www.google.com/amp/s/cloudblog.withgoogle.com/pro | duc... ___________________________________________________________________ (page generated 2020-09-26 23:00 UTC)