[HN Gopher] Hello Dolly: Democratizing the magic of ChatGPT with... ___________________________________________________________________ Hello Dolly: Democratizing the magic of ChatGPT with open models Author : hnuser0000 Score : 387 points Date : 2023-03-24 12:21 UTC (10 hours ago) (HTM) web link (www.databricks.com) (TXT) w3m dump (www.databricks.com) | bob1029 wrote: | > Surprisingly, instruction-following does not seem to require | the latest or largest models: our model is only 6 billion | parameters, compared to 175 billion for GPT-3. | | We started seeing this in our testing. OpenAI's Curie model is | responding very well to our fine-tuning experiments for chatbot- | style interface. I am trying to keep us focused on quality of | training data rather than obsessing over raw network size. | Davinci (and derivatives) might turn out to be overkill for our | use cases. | imwithstoopid wrote: | here come the "Me Too!!" announcements from everyone trying to | catch some of the energy of this new market | | how long until IBM, Tesla and Oracle announce Me-Too LLMs? | [deleted] | gavi wrote: | Its trained on alpaca dataset which in turn was generated from | open ai davinci, wondering if it is actually transferring the | weights by generating content from the source model? | epups wrote: | I think this is cool, but it's on the range of complexity that I | would expect from a personal project. When you put a whole | organization behind it, I feel you could have provided something | extra - better datasets? Improved weights from a ton of training? | kvmakes wrote: | Super cool stuff! | Mizza wrote: | It's immediately become difficult to untangle the licensing here. | Is this safe for production use - I have no idea if I can expect | a DMCA from Mark if I step out of bounds with this or other post- | Alpaca models, unless I'm missing something important. Meta | really botched the Llama release. | pwendell wrote: | Yes it's nuanced, but will be simplified going forward. | | This uses a fully open source (liberally licensed) model and we | also open sourced (liberally licensed) our own training code. | However, the uptraining dataset of ~50,000 samples was | generated with OpenAI's text-davinci-003 model, and depending | on how one interprets their terms, commercial use of the | resulting model may violate the OpenAI terms of use. For that | reason we are advising only noncommercial use of this model for | now. | | The next step here is to create a set of uptraining samples | that is 100% open. Stay tuned. | Taek wrote: | Are you in touch with the OpenAssistant team? I believe they | already have a more or less complete set of samples | (100,000!) that were produced in an open environment and | aren't encumbered by any licensing. | pwendell wrote: | No I haven't heard of that, we'll engage with that team. | This is exactly what we need will look into it. | babyyoda wrote: | Given that Alpaca strictly specified that they released purely | for academic use and any commercial use was prohibited given | doing so would violate terms of service, I don't see this as | viable for use. Looks like marketing gimmick | rnosov wrote: | This has nothing to do with facebook. The foundational model | here is GPT-J which is opensource and safe to use. Sadly, it is | inferior to state-of-the-art models such as LLaMA. | Mizza wrote: | But they're "using data from Alpaca". I don't know what that | means, isn't Alpaca using data generated by ChatGPT, which | isn't "clean" to use? Or data from Facebook, which isn't | "clean" to use? I'm drowning. | bilekas wrote: | I don't know the full details but Alpaca is from Stanford | and only based on the LLamA (not a derivative work afaik). | That said : | | Also Meta's licensing here | https://github.com/facebookresearch/llama/blob/main/LICENSE | | Can't be sure what that license actually reffers to, the | language model or just the tooling in the Git Repo. | | I agree its a minefield, but with Meta I would eer on the | side of caution. | rnosov wrote: | They are instruction tuning it using the dataset released | by stanford-alpaca team. The dataset itself is synthetic | (created using GPT-3) and somewhat noisy and in my view can | be easily recreated if OpenAI ever tries to go after it | (which is very unlikely). Anyway, facebook has nothing to | do with anything used by this project. | Mizza wrote: | So, this is a "dirty" model, in that is was created by | data which violated OpenAI ToS. Obviously, this kind of | violation is basically fine if you're a massive | corporation who the rules don't apply to, but it's a huge | risk if you're a small fish. | sebzim4500 wrote: | That's between OpenAI and the people that recorded the | data. No one else needs to care. | hutzlibu wrote: | "basically fine if you're a massive corporation who the | rules don't apply to, but it's a huge risk if you're a | small fish" | | With these things, it is usually the other way around. | | If you are a small fish, no one will care. But if you are | big enough, that money could be extracted from you, then | they will come. A big org just has better lawers and | negotiating power, but they really cannot ignore the law. | Especially not, if there is a competitor with money to | sue. | | So if you are small and want to become big, better be | cautious on the legal ground you are walking. | gremlinsinc wrote: | If you use output, from a non-profit who open sourced the | output gained by following the TOS, as in they aren't | using it 'for profit', it's not illegal, because: | | A. it's an output gained via following the letter of the | law (TOS). | | B. TOS only applies directly to people who've accepted | the TOS, unless alpaca's license/TOS ALSO forwards the | same criterion as it's source at openai, then derivatives | wouldn't apply. | | It's like if an app developer on IOS violated a TOS, and | apple tried to go after everybody who ever used the app, | they didn't agree directly to the TOS, only the developer | did. | rnosov wrote: | ToS are not the law. It would be similar to your power | company claiming copyright over the code written using | "their" electricity. Not going to happen. I wouldn't be | too concerned. | sp332 wrote: | No, but you could be banned from using OpenAI products in | the future, which seems like quite a liability for a | researcher or company. | rnosov wrote: | That would be anticompetitive practice that is actually | against the law in many countries[1]. In the unlikely | event of OpenAI ever engaging in such things they will be | sued into oblivion. | | [1] https://en.wikipedia.org/wiki/Refusal_to_deal | Spivak wrote: | Especially when OpenAI explicitly doesn't have a claim to | copyright on the model output. | bilekas wrote: | > Meta really botched the Llama release. | | It's no surprise really though, from what I see they recognised | some way to monitize and rolled back their commitment. | | But this Dolly doesn't depend on Llama (unless I'M missing | something), so you don't have to use it. | leobg wrote: | Why? Dolly had nothing to do with Llama or its weights. | | Besides: How would anyone ever know which model generated the | output you are serving? AFAIK there is no fingerprint in any | model's output. And even if there was, it would probably be | destroyed by fine tuning "over it". | stametseater wrote: | > _AFAIK there is no fingerprint in any model's output._ | | It seems like there easily could be. What if some of the data | they trained it on didn't exist anywhere else except in the | training set, and was put there specifically for this | purpose? For instance they could have taught it a few poems | that don't exist anywhere else. If you can coax the LLM of | unknown origin into reciting those poems back to you, you | know where it came from. | kurthr wrote: | Even easier have a small set of 8-10 character gibberish | tokens it's trained on in a particular contexts (eg a non- | existent poem). Then feed it one or several poems and see | if a gibberish token pops out. | eigenvalue wrote: | I think they call these canary GUIDs. If you manage to | generate one from an LLM then you can conclude with | certainty that the model saw that document during | training. | neilv wrote: | > _Besides: How would anyone ever know which model generated | the output you are serving?_ | | There's precedent for "whatever you can get away with" in | tech companies, but establishing a culture of that at the | start of this new big change could end up undesirable for | most people. | | For example, it could relieve demand for more legal and | sustainable ways, until it's too late. (Look at the history | of digital entertainment media piracy and DRM and | legislation, for example. Or look at the history of software | piracy, where some big companies seem to actually want their | product to be pirated, partly because it builds a bigger moat | against competitors, and they can legally strongarm some of | those pirates later.) | bilekas wrote: | This is really great news and something I felt was missing from | the market so far. It seems everyone wants to create `moats` or | walled-gardens with some aspect of their models etc. | | Nice job DataBricks, nice numbers too. Looking forward to more | improvements. | detrites wrote: | Thought the same until I read this: | | > Contact us at hello-dolly@databricks.com if you would like to | get access to the trained weights. | bilekas wrote: | This is not an issue though, they would just be the weights | used by DataBricks, there is no reason you can't add your own | right ? | | Like giving away a website template without the demo content, | it's perfectly normal. | superchink wrote: | https://github.com/databrickslabs/dolly it's now available on | GitHub | jppope wrote: | data transfer might actually be the problem there not | something like trying to hide the model | yieldcrv wrote: | bittorrent, come on | crosen99 wrote: | Fine-tuning these models reminds me of the good ol' days with | tube TVs where the slightest twist of the vertical hold dial | meant the difference between a clear picture and useless, | dizzying, visual nonsense. | woeirua wrote: | This is the real risk to OpenAI's business model. If it turns out | that you can get most of the same outcome with drastically | smaller and cheaper models, then OpenAI is going to have a hell | of a time keeping customers around as it will just be a race to | the bottom on price and bigger, more expensive models will lose | just from a hardware cost standpoint. | xpe wrote: | No disrespect to the author intended, but the above comment is | muddled. | | 1. OpenAI, the organization, is not equivalent to its chat | offering. | | 2. Saying "the" real risk isn't persuasive. Let's examine many | risks before claiming one is the most significant. Also, "real" | is this usage often a throwaway (i.e. unneeded) word, in editor | speak. | | 3. Let's talk about OpenAI's "business model" (though such | discussions are tricky). | | 3A. Originally, OpenAI wasn't trying to "hold onto" AI | advancements. It claimed to be a broadly funded way to explore | fundamental questions of artificial intelligence in a non- | commercial, ethical way. | | 3B. Of course, the above claim was largely aspirational, | because it wasn't baked into their DNA in way that could | survive the surrounding temptations for more funding, glory, | and resources. | | 3C. Even with their more commercialized model of the last | several years, it seems their business model feels like (a) | fundraise in exchange for (b) (claimed) collective good open | source, tools and shared research. | | 3D. OpenAI feels to me more and more like a commercial research | lab; there does seem to be a lot of commercial partnering with | their funding organizations (e.g. Microsoft). | | 4. I doubt the leadership there views the current ChatGPT | models as unchanging. I expect there is a considerable revenue | stream _around_ the space. OpenAI is well positioned to play | the game several steps ahead of others. | | I would frame the broader question this way: for many years, | there has been a hunger for this deeper AI research, due not | only to (i) the expertise and resources required, but also (ii) | to this hope that there is an organization that can maybe keep | it within human or ethical bounds. | | Unfortunately, this amorphous hope doesn't seem to be matching | the actual organizational incentives nor dynamics. It is also | unclear how much demand the public in free market will have for | nobler research. | | My position on these kinds of things is simple: follow the | money. If we want an accountable public interest, AI research | laboratory it's going to have to be designed, funded, and | overseen very differently. | smoldesu wrote: | On the flip-side, OpenAI is primed to destroy their | competitors. Partnership with Microsoft means they can buy | Azure compute at-cost if need be. Their current portfolio of | models is diverse on the expensive and cheap ends of the | spectrum, with thousands of people on Twitter and HN still | giving them lip-service. With dozens of clones hitting the | market, OpenAI is the only one staying consistently relevant. | | The widespread adoption of local AI won't obsolete a well- | priced AI API. I feel like we learned that lesson pretty | thoroughly in the SaaS era. | xpe wrote: | > The widespread adoption of local AI won't obsolete a well- | priced AI API. I feel like we learned that lesson pretty | thoroughly in the SaaS era. | | Unless I am misunderstanding (?), this seems like an | overgeneralized lesson. There are many key differences | between these situations that make such a connection | unlikely. Could you explain your reasoning? | ijustlovemath wrote: | The difference between this and SaaS is that businesses have | been moving their (end user) products to SaaS due to wider | broadband availability, as well as greed (read: MRR), but on | the LLM side, people are _building new products with it_ , so | the incentives are to keep your costs low (or free) so you | can make more money once you release. | nico wrote: | That's why they are moving so fast and trying to get as much | press/media attention as possible. | | They want to stay top of mind. | | Think about CocaCola, anyone can make a drink just as good. But | it's almost impossible to build their brand and distribution | from scratch. | lfciv wrote: | I wouldn't underestimate the power of momentum | rashkov wrote: | What about the high quality training data that OpenAI has | encoded into ChatGPT? Do these other models come close to that? | woeirua wrote: | Why couldn't you just use OpenAI's API to feed prompts and | then take the outputs and use them to train your own model to | exfiltrate the best features of GPT? | xpe wrote: | Give it a try if you feel like it is a good thing to do. | I'm sure some nation states are doing it. | | P.S. this comment does not reflect my personal values. But | I would rather someone with values try it almost like a | white hat pen test. | wsgeorge wrote: | Because it would be against their TOS, and things could | look ugly, legally. | tspike wrote: | How many TOS agreements do you suppose they violated | while training their models? | AJ007 wrote: | It's still an open question if any of these models, | trained on copyright work, will themselves be eligible | for copyright protection. | ImHereToVote wrote: | Ironic | ImprobableTruth wrote: | Is this a bit? If it's illegal to train on copyrighted | material, then OAI has broken the law ten times over by | training GPT3. There's absolutely zero reason for them to | sue, they'll just ban the responsible people. | nickthegreek wrote: | I think their TOS forbids using the API for this. I don't | think it covers the use of the web interface. | circuit10 wrote: | However: | | "You may not [...] except as permitted through the API, | use any automated or programmatic method to extract data | or output from the Services, including scraping, web | harvesting, or web data extraction;" | nickthegreek wrote: | Can't be automated, so manual extraction is allowed. | typon wrote: | That's how Alpaca is made | aabajian wrote: | Anyone care to comment on why the output of these models changes | so dramatically given so little Q&A training? It's a 6 billion | parameter model with only 50 thousand Q&A samples. | | It's clear the model already "knows" the format of a Tweet (short | length, attention-grabbing, contains hashtags). The model also | knows stuff about language models (word2vec, tokenization), and | can include entities from the question in its response (Dolly, | Databricks). Yet, it just doesn't put these pieces together in | the right way without the Q&A training. | | Edit: For kicks, I asked GPT-4 this question: | https://imgur.com/a/sM4uyBn | pwendell wrote: | Yes this was a very surprising result... that the relatively | small uptraining was able to unlock so much latent knowledge in | the model. | bogwog wrote: | Open Assistant is doing the same thing, but actually creating a | dataset that isn't on questionable legal grounds by creating a | gamified web app where people can contribute: https://open- | assistant.io/dashboard | | I wonder how small can these models get? From 175B to 6B with | comparable performance is huge, but can it go lower? | highwaylights wrote: | I see that in its five book suggestions that it has suggested you | should read Hitchhikers Guide twice. | | Not many humans would even get this answer correct. | | I am impressed. | Zaheer wrote: | How hard would it be to embed this into a NPM module so anyone | can use it in their servers / apps locally? | [deleted] | sbussard wrote: | I'd like some clarification of terms - when they say it takes 3 | hours to train, they're not saying from scratch are they? There's | already a huge amount of training to get to that point, isn't | that correct? If so, then it's pretty audacious to claim they've | democratized an LLM because the original training likely cost an | epic amount of money. Then who knows how much guidance their | training has incorporated, and it could have a strong undesirable | viewpoint bias based on the original training. | joshhart wrote: | The 3 hours is the instruction fine-tuning. The base | foundational model is GPT-J which was already provided by | Eleuther-AI and has been around for a couple of years. | | Note: I work at Databricks and am familiar with this project | but didn't work on it. | Taek wrote: | Do you know why GPT-J is being used instead of NeoX or any of | the other larger open source models? | cuuupid wrote: | I don't love the lack of quantitative comparison to Alpaca but a | commercial model (which sounds like it's in the works) would | finally move the needle on democratizing access to LLMs. | | Will also commend the authors for not falling into the "LLMs | can't perform without 200B params!" fallacy. For anyone reading, | 6B params is enough to train on a 3090. A PC rig for training or | running inference with this would put you back maybe 4k$. | | The end game here is likely getting the model to perform well in | millions of parameters on specific tasks. Most business uses of | ChatGPT are pretty closed domain tasks, it wouldn't be a huge | step to distill this model on a specific task and get it down to | 150-350M params (which is roughly BART size and can run on AWS | Lambda even). | nothrowaways wrote: | "ChatGPT, a proprietary instruction-following model" pun | intended. | jawadch93 wrote: | [dead] | mydpy wrote: | What a great time to be in this field. It's advancing so quickly! | sillysaurusx wrote: | Interesting. DALL-E, Dalai | (https://cocktailpeanut.github.io/dalai/), and now Dolly are all | pronounced the same way. | | It feels like there should be an xkcd for this. | joseda-hg wrote: | Are they? (Not sarcastic, I'm not native and I wouldn't | pronounce them all that similar at first sight) | dwringer wrote: | As a native speaker, no, there's hardly any consensus I've | seen about how to pronounce them. Certainly there are trends. | But I pronounce Dalai somewhere between "Dah-lay" and "Dah- | lie", and DALL-E _sorta_ like Dolly ( "Dah-lee"), but with a | deliberate pause ("Dahl Ee"). | [deleted] | outside1234 wrote: | What could go wrong | JLCarveth wrote: | > are all pronounced the same way | | No they're not. | mejutoco wrote: | AFAIK DALL-E is pronounced as Dali, as in Salvador Dali. | | https://en.wikipedia.org/wiki/Salvador_Dal%C3%AD | chatmasta wrote: | I figured it was a reference to the Dalai Lama (which doesn't | invalidate your comment, since that's also pronounced like | Dali). LLM -> Llama -> Dalai Lama | rburhum wrote: | Dali has an accent at the end, which has the emphasis in | the last letter. Dalai does not. They sound very different. | "dah-lee" vs https://m.youtube.com/watch?v=JhFbvuKn45w | sillysaurusx wrote: | Hmm. Is Salvador Dali pronounced differently than Dolly or | Dalai? The wikipedia page has "dah-lee" as the phonetic, | and https://www.google.com/search?q=pronounce+salvador+dali | sounds the same as | https://www.google.com/search?q=pronounce+dalai+lama. So it | seems like all three are identical. | ToValueFunfetti wrote: | The emphasis in Dali is on the second syllable, which is | at least different from Dolly. I've always pronounced | Dalai Lama the same as I would Dolly Lama, but Cambridge | dictionary is saying it should be Da-lay in both US and | UK pronunciations. | | Tangentially, it seems like most of the results for both | searches were autogenerated with TTS programs. I wonder | if our pronunciations will shift towards TTS mistakes | over time. Probably not, these videos only have a few | thousand views, but neat if true. | mejutoco wrote: | Dali has the stress on the last syllable, hence the | accent (but Dall-e probably not). In my native language | Dalai is pronounced "Da-lie", like another comment says | above. TIL Dolly is pronounced so similarly. I thought | the Do sounded like Doberman, but apparently not. | | https://www.merriam-webster.com/dictionary/dolly | 4ndrewl wrote: | I thought "Dalai" pronounced "Dall Eye" rhymes with "Shall | I" "Dali" pronounced "Dahl eee" rhymes with "Carly" | tetraca wrote: | This is all very weird to me because I've always | pronounced Dalai as "Dah-lay". | chatmasta wrote: | Interesting. According to Google, it's a British ("Da- | lie") vs. American ("Da-lee") difference. | [deleted] | bilekas wrote: | Handy also to think off WALL-E. At least that where my | assumption came from. | [deleted] | cosmojg wrote: | It's quite clearly a reference to WALL-E the environmentally | conscious robot, which is pronounced as you'd expect. I like | to think of it as DALL-E the surrealist robot painter. | mejutoco wrote: | That is exactly my interpretation. Both wall-e and Dali. I | think we are in agreement. | JohnFen wrote: | I totally failed to make that connection! Was that the | intended reference? What's the link to WALL-E? | renewiltord wrote: | Wow, just discovered that the American pronunciation for Dalai | Lama is Da-lee. Well, that's a discovery. | | This is like when Khan Academy came out and there was a guy | online saying it's a terrible brand because it sounds like Con | Academy which it doesn't in my dialect. | | Took a while to get it. | rockzom wrote: | How do you say Khan? | ricardobeat wrote: | Kan / k-a-n, A like in "father" | theSuda wrote: | I found this which matches how I say it (as an Indian) | https://www.howtopronounce.com/khan/4145893 | | It's the KH sound that doesn't really exist in English | hence many get it wrong. | gowld wrote: | The KH is one thing, but for "con"-fusion (hah!), it's | also about the "higher" "caan" vs "cawn", which is a very | subtle difference. | xg15 wrote: | I guess after carcinisation comes dolly-fication... | | But I do like the hang for whimsical naming schemes in that | field. First sesame street characters, now apparently | everything sheep... | thewataccount wrote: | I might be having a moment - but I can't find any links to a git | repo, huggingface, or anything about the | models/weights/checkpoints directly from the article. | | I just see a zip download that AFAIK also doesn't contain the | weights/checkpoints. I find this a bit odd, the contents of the | zip (from the gdrive preview) look like they should be in a git | repo, and I assume they download the model from somewhere? GDrive | usually has rate limits which I'm concerned about. | | If anyone from databricks reads this - are there plans to publish | this on a git repo somewhere, as well as the weights/checkpoints? | | EDIT: Oh I just noticed | | > Contact us at hello-dolly@databricks.com if you would like to | get access to the trained weights. | | This... seems odd for a article titled "Democratizing the magic | of ChatGPT with open models"? | MagicMoonlight wrote: | So it's another classic private only model that they'll pull as | soon as the suckers have trained it up for them | thequadehunter wrote: | Lol. This is classic ML crap. Files with no documentation, no | links, multiple files with the same-ish name but no explanation | for which one is what. | nofinator wrote: | Yes, the ZIP on Google Drive owned by one of their engineers is | weird considering they have a pretty active GitHub presence of | open source projects, though it does use an Apache license like | their others. | | Perhaps Databricks suspected another big announcement coming | soon and wanted to get this announcement out? | amrb wrote: | Are they pulling a Facebook, on model access? | thewataccount wrote: | From what I can tell they're fine-tuning EleutherAI's GPT-J. | | Alpaca was made to fine-tune LLaMa, however they also | released their dataset they used to do this, and it looks | like Dolly is this dataset applied to GPT-J, and does not use | LLaMa itself. | dragonwriter wrote: | I think they are dodging unclear legal issues surrounding | certain steps of the model-building process while being as | open as possible with the components given that constraint, | allowing downstream users to make their own legal risk vs. | effort choices. | pwendell wrote: | Yes, this. | amrb wrote: | Given the hardware/energy need to train it be nice, to have | a legal document that said something like this model has no | warranty, it may be a break through machine or a hand | grenade. Use at you own risk! | slimsag wrote: | The README also says this: | | > This fine-tunes the [GPT-J | 6B](https://huggingface.co/EleutherAI/gpt-j-6B) model on the | [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) | dataset using a Databricks notebook. | | > Please note that while GPT-J 6B is Apache 2.0 licensed, the | Alpaca dataset is licensed under Creative Commons NonCommercial | (CC BY-NC 4.0). | | ...so, this cannot be used for commercial purposes | dwallin wrote: | Essentially every model worth anything has been trained on a | unfathomably large amount of data under copyright, with every | possible licensing scheme you could imagine, under the | assumption that it is fair use. While you can argue that it's | all built on a house of cards (and a court may well agree | with you) it's kind of arbitrary to draw a line here. | judge2020 wrote: | > under the assumption that it is fair use. | | No, because you as a human looking at "art" over your | lifetime and learning from it is not "fair use" of the | copyright, it's no-use at all. This is the crux of every | argument for both for language models and AI Art models, | that their tools are learning how to draw, learning what | styles and characteristics of input art correspond the most | with words, and creating art with that knowledge just like | any other human, not simply collaging together different | pieces of art. | Taek wrote: | Fair use via "this is completely impossible to regulate so | you might as well embrace it" | ambicapter wrote: | > ...so, this cannot be used for commercial purposes | | The implication being that you're only "democratizing" | something if people can make money off of it? | dragonwriter wrote: | > ...so, this cannot be used for commercial purposes. | | The legal relation between models and training data sets | seems murky; of course, with the build tooling, you can also | substitute in another instruction-following training set if | you want to avoid licensing issues with the Alpaca set, | whereas if you aren't concerbed with them, you can just blaze | ahead. | chpatrick wrote: | As far as I know the copyright situation for models is | ambiguous and also depends on the region. In the US you can't | copyright data made by an automated process but you can in | the EU, or something to that effect. | yieldcrv wrote: | > ...so, this cannot be used for commercial purposes | | or you can raise $30,000,000 right now and worry about the | copyright infringement lawsuit in 2026 or never. | thewataccount wrote: | > ...so, this cannot be used for commercial purposes | | Can't they also release the fine-tuned weights as non- | commercial as well? | dopidopHN wrote: | Thanks I missed that email while skimming | pwendell wrote: | Full source code is up here now: | | https://github.com/databrickslabs/dolly | | Sorry it took us a day to get the external repo setup. | thewataccount wrote: | Awesome thank you! | | Was the Alpaca dataset being licensed as non-commercial only | the reason you aren't releasing the weights? Is it possible | to just release them under the same license? | pwendell wrote: | Yes the issue is that some of the training data is arguably | tainted with some noncommercial license (it's nuanced, | discussed below in my comment). We are releasing weights to | people who request but we just wanted to have an email | request flow so that we can make sure people know it's just | for noncommercial purposes. | | Working on a model without this issue. Certainly our goal | is totally open models anyone can use for anything. | thewataccount wrote: | Understandable, thank you for the response! | | I've been a bit jaded by the "open/democratizing ai" | stuff and then having companies stiff us at actually | making it open - but not wanting to be the first to | litigate these new types of issues ml brings is very | understandable. | | Question - Would you consider benchmarking a single 4090 | for your training? While training in a few hours with 8x | A100's is impressive, myself and I think others are | curious how that translates to consumer hardware. IMO | running/fine-tuning on consumer hardware is the ultimate | endgame for all ai models. | robwwilliams wrote: | Look forward to a response. We are heading toward a 6X | Bizon 4090 system as a test bed. | | https://bizon-tech.com/bizon-zx5500.html | m3affan wrote: | Databricks is on a roll | jppope wrote: | Does anyone else find it ironic that all these ChatGPT "clones" | are popping up when OpenAi is supposed to be the ones open | sourcing and sharing their work? | | I guess: "You Either Die A Hero, Or You Live Long Enough To See | Yourself Become The Villain"? | Taek wrote: | Sam Altman has turned into a megalomaniac. | brandall10 wrote: | Possibly, but it is a bit unusual that he has zero equity in | the company. So it might not be for monetary reasons. | [deleted] | JohnFen wrote: | > when OpenAi is supposed to be the ones open sourcing and | sharing their work? | | OpenAI renounced being open source. Don't let the name fool | you. | throwaway4837 wrote: | I think all of the "AI alignment" talk is mostly | fearmongering. It's a cunningly smart way to get ignorant | people scared enough of AI so they have no choice but to | trust the OpenAI overlords when they say they need AI to be | closed. Then OpenAI gets a free pass to be the gatekeeper of | the model, and people stop questioning the fact that they | went from Open to Closed. | | AI being tuned to be "safe" by an exceedingly small set of | humans is the thing we should be afraid of. It's the | effective altruism effect: if you bombard people enough with | "safety" and "alignment" speak, they will look past the fact | that you're mainly interested in being a monopoly. My bigger | conspiracy theory is that Bill Gates getting behind "AI | alignment" is a calculated move to get people to look past | Microsoft's unilateral involvement. | soup10 wrote: | I don't know what press releases you've been reading, but | the model is closed so they can make money off it, that's | pretty obvious. | throwaway4837 wrote: | I think that is a simple take and underestimates the | insidious nature of the AI alignment initiatives. Or | maybe I'm overestimating it. | TigeriusKirk wrote: | At this point I'm really not sure what they're up to in | terms of grand strategy. I don't even know that making | money is their ultimate goal. At a certain level of | ambition money is just a tool to get what you really | want. | brandall10 wrote: | It's interesting to note that Altman has no equity in the | company. One of the primary motives of being a for-profit | company that was espoused was to be competitive with big- | tech AFA bringing in top-level research talent. | JohnFen wrote: | I don't think that Altman's lack of equity position in | OpenAI means anything at all when it comes to what | OpenAI's goals are. | | We know what their immediate goals are: to make as much | money as possible. The only question is what their | longer-term goals are. | 0xDEF wrote: | AI and high-performance semiconductors are the only | technological fields where the US and allies haven't been | surpassed by Russia and China. | | There is probably a lot of political pressure on OpenAI to be | as closed as possible. Remember the US government has banned | Nvidia from exporting A100/H100 to China/Russia. Those are the | same chips OpenAI uses for both training and inference. | amelius wrote: | Anyone in China/Russia who can comment on the actual | situation? How difficult is it to train/run AI models where | you are living? | coolspot wrote: | Russia is simply importing A100s through shell companies in | UAE. | htrp wrote: | TLDR: | | Download GPT-J-6B from Eleuther | | Download Alpaca Fine Tuning Code + Alpaca Examples | | Train for 6 hours or so. | | Get vaguely good RLHF model | typon wrote: | Key point is vaguely good. Scale is still important and that | manifests in the difference between gpt3.5 and gpt4 based | chatgpts. It's qualitatively and quantitatively so much better | in pretty much every benchmark. There is no way around the | bitter lesson. | bodyfour wrote: | > There is no way around the bitter lesson. | | Isn't there? I'm certainly not sure, based on the results | published over the last weeks and months. | | The giant GPT-{3.5,4} models show that if you make the model | big enough and throw enough data at it you can produce an AI | capable of conversing on basically any topic, in dozens of | languages. There are plenty of different takes on how near- | human its abilities are on specific tasks, but it's worth | stepping back and appreciating how super-human the _breadth_ | of this knowledge is. | | But it's also not clear if a mega-model is anything close to | the most efficient way of storing knowledge. After all, you | don't need to memorize every fact in Wikipedia if you know | how to effectively search it. | | And we're currently seeing a daily explosion in these | capabilities. Today's flavor is interfacing with Wolfram, but | we've also seen web searches, python coding, etc. That, I | think, it the real superpower that comes out of this: you or | I can answer a question by "doing a web search" or "query a | database" or "use wolfram" or "develop a python program that | finds the answer" However, an AI could do tasks like this | just by "thinking" about it. Maybe it would be as natural as | we find blinking. | | That to me is the real breakthrough in stuff like Alpaca -- | start with a mega-model and prompt it with something like: | "After this paragraph, you are going to be speaking to a AI | model similar to yourself but much more primitive. Its task | will involve interfacing with English speakers, so converse | with it only in that language. It has access to the same | {X,Y,Z} APIs you have so any time it has trouble answering a | question, prefer to give hints about how it could find the | answer using those APIs rather than providing the answer | directly yourself. Only give an answer directly if it | repeatedly fails to be able to answer it by using an API. | I've provided a large set of standardized tests used by | humans at this URL -- start by asking it questions intended | for a preschool-aged child. Each time it is able to answer | new questions at a given level correctly 99% of the time | increase the material's level until it is able to achieve | that score on a test designed for a Computer Science PhD | candidate" | | How large would the "student" model have to be to succeed at | this deep but narrower task? I think the answer right now "we | have no idea". However if the model has the advantage that it | can rely on external knowledge and tools from the start (and | is rewarded by the "teacher" for doing just that) I bet it'll | be a lot smaller than these mega-models. Sure, you wouldn't | be able to disconnect the "student-AI" from its APIs and | expect it to converse with you in Hungarian about the history | of yacht design, but that might not be a capability it needs | to have. | | My personal hunch is that we're going to find these "AI- | taught specialist AI, with API access" models will be a lot | smaller than most people are expecting. That's the moment | when things REALLY change: instead of pairing a human with a | mega-model AI, if specialized models are cheap someone can | say "spin up 100K expert-programmer AIs and have them | supervized by 5K expert-manager AIs and have them build XYZ" | | Or if you need it to work on an existing task you'd | specialize further -- you'd go to your AI vendor and say "I'd | like to license the weights for your expert-programmer model, | but first have it read these 200 books I consider important | to my problem domain and then show it every commit ever made | by a human to my git repo and every design document I have" | typon wrote: | Very good analysis. I disagree with a fundamental point | though: If you don't consider compute cost and just want | the best possible AGI, then there's nothing stopping you | from supercharging the mega-models with the same | capabilities as the smaller models - and if the current | scaling shows anything, the mega models will just become | even better. | bodyfour wrote: | > If you don't consider compute cost [...] | | Yes, but what if you do? Imagine your hyper-specialzied | API-heavy model takes 10x less resources to answer a | question (or at least a question relevant to the task at | hand) Won't it be more powerful to have a model that can | run 10 times as fast (or run 10 instances in parallel)? | | What if the ratio turns out to be 100x or 1000x? | | So I agree that the cutting edge of "best possible AGI" | might mean building the largest models we can train on | massive clusters of computers and then run on high-end | hardware. My hunch, though, is that models that can be | run on cheap hardware and then "swarmed" on a problem | space will be even more powerful in what they can perform | in aggregate. | | Again, it's just my hunch but right now I think | everybody's predictions are hunches. | | I'll actually go one bit further: even for a linear task | that can't be "swarmed" in the same way, it could be that | cheaper-per-token models could even do better on linear | problem-solving tasks. Existing models already have the | ability to use randomness to give more "creative", if | less reliable, answers. This is inherently parallelizable | though -- in fact Bard seems to be exposing this in its | UI in the form of multiple "drafts". So what if you just | ran 100 copies of your cheap-AI against a problem and | then had one cheap-AI (or maybe a medium-AI) judge the | results? | | Or at the risk of a getting too anthropomorphic about it: | imagine you as a human are writing a program and you get | stuck on a tricky bit -- you know that the problem should | be solvable but you've never doing anything similar and | don't know what algorithm to start with. Suppose then you | could tell your brain "Temporarily fork off 100 copies of | yourself. 10 of them go do a literature review of every | CS paper you can find related to this topic. 10 of you | search for open source programs that might have a similar | need and try to determine how their code does it. The | other 80 of you just stare off into the middle distance | and try to think of a creative solution. In two human- | seconds write a summary of your best idea and exit. I'll | then read them all and see if I/we are closer to | understanding what to do next" | | For us, this type of mental process is so alien we can't | even imagine what it would feel like to be able to do. It | might come completely natural to an AI, though. | not2b wrote: | Sometimes you do need to consider compute cost, say if | you want a small but high quality model that can run on a | smart phone to perform a task. For example, with camera | input, identify a plant or animal, while in a remote area | with no cell signal, so it has to yield an answer without | communicating with a server. What's the smallest, most | efficient model that can do that effectively? Build that. | avereveard wrote: | > you don't need to memorize every fact in Wikipedia if you | know how to effectively search it. | | yeah you're onto something. models good enough to sustain a | conversation where I bring my own data as a primer are | probably more useful that models that have a frozen | knowledge of everything. the killer feature of gpt-4 is the | 32k token size, which allows unprecedented amount of input | to be fed into the knowledge graph and queried. | feanaro wrote: | Isn't it the case that we literally have no clue how GPT4 and | GPT3.5 are different in terms of training, given OpenAI | doesn't want to disclose anything at all? | typon wrote: | It's not true we know nothing. We know a little bit by | using the two models from their API. Given the time per | inference and the limit on messages per day for GPT4, I'm | willing to bet it's doing around 10x more compute than | GPT3.5. If that's because it has 10x more weights, I don't | know. But it wouldn't be a terrible guess. | feanaro wrote: | So your estimate is that GPT4 has 1.75 trillion weights? | dwaltrip wrote: | Is there anything that affects inference compute time | besides the number of parameters? Assuming same hardware, | etc. | typon wrote: | Yes - for example adding memory to the attention | mechanism (similar to RETRO or Memorizing Transformers | paper) | computerex wrote: | We don' have the details, it is true. But empirically and | based on their report gpt-4 is notably better than chatgpt. | feanaro wrote: | Better, yes, and for that we have evidence. But is the | improvement stemming simply from even more data? That's | what I'm questioning. | computerex wrote: | This paper is pretty approachable and goes over the | "scaling laws" in detail: | https://arxiv.org/abs/2206.07682 | | In short, yes. More data, higher quality data, more | epochs on the data. That is the name of the game. | stevenhuang wrote: | It's speculated it has same number of parameters, but | more compute and is multi modal. | UncleEntity wrote: | Free is better than $$/token imho. | | If you have a use case or a bunch of disposable income then | go with the "bitter" one. ___________________________________________________________________ (page generated 2023-03-24 23:01 UTC)