[HN Gopher] StableLM: A new open-source language model ___________________________________________________________________ StableLM: A new open-source language model Author : davidbarker Score : 980 points Date : 2023-04-19 15:10 UTC (7 hours ago) (HTM) web link (stability.ai) (TXT) w3m dump (stability.ai) | [deleted] | Garcia98 wrote: | I really dislike this approach of announcing new models that some | companies have taken, they don't mention evaluation results or | performance of the model, but instead talk about how | "transparent", "accessible" and "supportive" these models are. | | Anyway, I have benchmarked stablelm-base-alpha-3b (the open- | source version, not the fine-tuned one which is under a NC | license) using the MMLU benchmark and the results are rather | underwhelming compared to other open source models: | | - stablelm-base-alpha-3b (3B params): 25.6% average accuracy | | - flan-t5-xl (3B params): 49.3% average accuracy | | - flan-t5-small (80M params): 29.4% average accuracy | | MMLU is just one benchmark, but based on the blog post, I don't | think it will yield much better results in others. I'll leave | links to the MMLU results of other proprietary[0] and open- | access[1] models (results may vary by +-2% depending on the | parameters used during inference). | | [0]: https://paperswithcode.com/sota/multi-task-language- | understa... | | [1]: https://github.com/declare-lab/flan- | eval/blob/main/mmlu.py#L... | mnkv wrote: | How did you run the benchmarking, zero-shot or few-shot? I | think a fair comparison would be Llama-7B which got an average | ~35% for 5-shot. | Garcia98 wrote: | 5-shot prompting. | jvm wrote: | Doesn't make much sense to compare a model that's not fine | tuned to flan models that are fine tuned. Makes more sense to | compare to something like T5 base where it's probably a lot | more comparable. | Garcia98 wrote: | I would have compared it to the fine-tuned version if it had | been released under a truly open-source license. I think | developers implementing LLMs care more about licensing than | about the underlying details of the model. | | Also t5-base is 220M params vs 3B params of stablelm, not | really a fair comparison anyways. | Taek wrote: | They used training data with restricted licenses to produce | the fine-tuned models, it's not a decision that stability | was able to make. | Garcia98 wrote: | I disagree, they made the decision to use datasets with | restrictive licensing, jumping the | alpaca/gpt4all/sharegpt bandwagon. | | They also chose to toot their horn about how open-source | their models are, even though for practical uses half of | their released models are not more open source than a | leaked copy of LLaMa. | peterstjohn wrote: | So just use their base model and fine-tune with a non- | restrictive dataset (e.g. Databricks' Dolly 2.0 | instructions)? You can get a decent LoRA fine-tune done | in a day or so on consumer GPU hardware, I would imagine. | | The point here is that you can use their bases in place | of LLaMA and not have to jump through the hoops, so the | fine-tuned models are really just there for a bit of | flash... | sunshadow wrote: | Demo is overloaded. Here is Colab link if you wanna check the | quality: https://colab.research.google.com/github/Stability- | AI/Stable... | [deleted] | drexlspivey wrote: | Your session crashed after using all available RAM. If you are | interested in access to high-RAM runtimes, you may want to | check out Colab PRO | sunshadow wrote: | You can try 8bit | xkapastel wrote: | Using 8-bit still runs out of RAM for both the 3B and 7B | models. It's unclear if it's because it still uses more | than the available RAM, or if it's just quietly not using | 8-bit since it's not implemented. | BudaDude wrote: | This notebook will need high ram and a premium GPU. It blew | past the 16gb GPU that comes with Colab Pro | andy_xor_andrew wrote: | > The context length for these models is 4096 tokens. | | !!! And I was excited that llama gave us 2048!! | txtai wrote: | Great to see the continued release of open models. The only | disappointing thing is that models keep building on CC-BY-NC | licensed datasets, which severely limits their use. | | Hopefully, people consider txtinstruct | (https://github.com/neuml/txtinstruct) and other approaches to | generate instruction-tuning datasets without the baggage. | vagab0nd wrote: | I look forward to a future where almost all software would allow | me to plug in an LLM API: | | - browser: ad removal/skipping | | - RSS: information aggregation | | - recommendation systems | | - games: customized NPC scripts; AI opponents | | - home automation: personal butler | | Hopefully, there would be more than one base-layer LLM providers | to choose from. | rvz wrote: | Finally, the true race to zero is now on. | | Stability and others are already at the finish line in the race | to zero. OpenAI.com cannot get any cheaper and eventually will | have to increase their prices. | | There is no escape and as I said before, open source AI models | will eventually swamp and survive longer and intergrate faster | than even OpenAI.com's ChatGPT. | | Expect open source models to get smaller and even better such | that it will fit in an iPhone, on device. | | Stay tuned. | [deleted] | anticensor wrote: | Why are you referring to OpenAI as ai.com? | netsroht wrote: | I think he's just emphasizing that OpenAI is in fact not | open, thusly it's crossed out. | [deleted] | jhoelzel wrote: | Has anybody figured out the hardware requirements yet? | | This could be the beginning of something beautifull and i am | always impressed by the work stable diffusion does! | | They are the true openAI (without empty promises in the middle | ;)) | lanstin wrote: | GPT4 supposedly cost USD 100M to train and seems much smarter | than anything else released. is there any chance to get this much | money for open source training? rea lly big EU grants or some | philanthropist or? | dougmwne wrote: | This is absolutely huge. LLaMA weights leaking was a big deal, | but you couldn't actually use them without attracting Meta's ire. | Would love to see some benchmarking vs. LLaMA and GPT. | pyinstallwoes wrote: | What is IRE? | vineyardmike wrote: | It's a noun meaning anger or wrath. Usually it's specific not | just to triggering anger _the emotion_ , but an angry | _response_. So attracting Meta's ire means attracting an | angry response from meta - like the legal response users of | LLaMA have triggered. | | It's not an acronym. | dougmwne wrote: | The real thing we should be asking is if the 3B or 7B model | is capable of correctly answering this question. | jwineinger wrote: | anger. probably a lawsuit in this context | jmiskovic wrote: | There were no lawsuits around LLaMA. They used DCMA | requests to take down some GitHub and HuggingFace releases | but the majority of releases (Alpacas and other animals) | was left alone. It was enough to prevent commercial use, | though. Edit: a lawsuit would be quite interesting, to | clear up many things around how copyright works for LLMs. | inportb wrote: | Ire is the result of irritation. | ElFitz wrote: | Are you saying we shouldn't scratch Meta's itch? | wsgeorge wrote: | Oh this is good. :) | sangnoir wrote: | It's a department at meta: Intellectual Rights Enforcement. | /s | | "Ire" is a synonym for "anger" or "wrath" | pyinstallwoes wrote: | That's actually what I thought it might be. Thanks lol. | antibasilisk wrote: | Internal recurrent expenditure | jacooper wrote: | Do we even know if model weights are copyrightable or not? | sebzim4500 wrote: | No but unless you enjoy being sued by Meta you have to act as | if they are. | janmo wrote: | I was asking myself the same question and am 99% sure it | isn't protected by IP laws. It is another story for the | training data and the source code used to run the model. | Invictus0 wrote: | A JPG is copyrightable and that's just an array of numbers. | Model weights seem pretty similar as a class. | dougmwne wrote: | It's not the JPG data you can copyright, but the | recognisable image it produces. Every time you re-save or | resize the image, the data changes, but the recognizable | image remains. | | It's not clear how this process applies to model weights. | Once you run another training epoch on them, the data has | changed. What is the essential copyrightable, trademarkable | or patentable thing that remains? A legally untested | question for sure. | AustinDev wrote: | Couldn't you just transform the weights in some predictable | way and then untransform them before use in runtime? The | weights at rest would be completely distinct from the meta | weights. | brucethemoose2 wrote: | All sorts of llama weights are on huggingface now. | prions wrote: | > Supportive. We build models to support our users, not replace | them. We are focused on efficient, specialized, and practical AI | performance - not a quest for god-like intelligence. We develop | tools that help everyday people and everyday firms use AI to | unlock creativity, boost their productivity, and open up new | economic opportunities. | | Refreshing take on the peak alarmism we see from tech "thought | leaders" | gumballindie wrote: | Well exactly. AI _is_ a tool and a very good one at that. | CamperBob2 wrote: | I dunno... god-like intelligence would be pretty useful. I'll | take a brochure. | nashashmi wrote: | do you trust god? | CamperBob2 wrote: | Well, ... | | OK, I withdraw the comment. | sdenton4 wrote: | Why should I need to? Isn't God on the blockchain? (j/k) | garbagecoder wrote: | Yeah all the Terminator energy around these AI things is so | off-putting. They aren't like that. They're big matrices and | they are very cool tools! | bbor wrote: | What if big matrices are the last missing piece to research | going on since the 50s... | circuit10 wrote: | But the concerns about AI taking over the world are valid and | important; even if they sound silly at first, there is some | very solid reasoning behind it. They're big matrices, yes, | but they're Turing-complete which means they can | theoretically do any computational task | | See https://youtu.be/tcdVC4e6EV4 for a really interesting | video on why a theoretical superintelligent AI would be | dangerous, and when you factor in that these models could | self-improve and approach that level of intelligence it gets | worrying... | varunjain99 wrote: | Well, it's to their benefit to portray their models as working | alongside and enhancing humans, as opposed to replacing us. So | it sounds a bit like marketing speak to me. | | And it's to the benefit of many of those tech "thought leaders" | to be alarmist since they don't have much of the AI pie | seydor wrote: | Doesn't sell as much, though | reubenmorais wrote: | This is just marketing. They're positioning themselves as | somehow "more human" while building the exact same technology. | When a model supports me by doing the work I'd otherwise hire | someone to do, the model just replaced someone. And this goes | without saying, but a large amount of outsourced tasks today | don't exactly require "god-like intelligence". | rmbyrro wrote: | That was probably said about the automobile, when it replaced | horses, or about electrical lamps, when replaced oil-based | lamps, no? | | I mean, every city had an army of people to light up and down | oil lamps in the streets, and these jobs went away. But | people were freed up to do better stuff. | leovailati wrote: | It is different this time. I bet that was also said when | the transformations that you mentioned occurred, but _this | time_ it really is different. | | LLM models are pretty general in their capabilities, so it | is not like the relatively slow process of electrification, | when lamplighters lost their jobs. _Everyone_ can lose | their jobs in a matter of months because AI can do close to | everything. | | I am excited to live in a world where AI has "freed" humans | from wage slavery, but our economic system is not ready to | deal with that yet. | [deleted] | antibasilisk wrote: | >Refreshing take on the peak alarmism we see from tech "thought | leaders" | | It's not alarmism when people have openly stated their intent | to do those things. | dragonwriter wrote: | Its alarmism to support government regulation to reinforce | the moat when industry leaders say they intend to do it, but | also that the danger of it being done is why competition with | them must be restricted by the State (and why they can't, | despite being, or being a subsidiary of, a nonprofit founded | on an openness mission, share any substantive information on | their current models.) | ericpauley wrote: | I think there's a misunderstanding here (and on Alpaca) about | exactly what _non-commercial_ means with respect to software | licenses. Here 's some related discussion: | https://opensource.stackexchange.com/a/12074 | | The CC-NC licenses cover modification and redistribution ("use" | in the license). They apparently have no bearing on whether you | can personally _run_ the software. And of course the outputs of | the software are not copyrightable. | GaggiX wrote: | the models: https://huggingface.co/stabilityai/stablelm-base- | alpha-3b, https://huggingface.co/stabilityai/stablelm-base- | alpha-7b | | There are also tuned version of these models: | https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b | https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b, these | versions are fine-tuned on various chat and instruction-following | datasets. | | The Github repo mentions that the models will be trained on 1.5T | tokens, this is pretty huge in my opinion, the alpha models are | trained on 800B tokens. The context lenght is 4096. | bhouston wrote: | These models are huge. I assume they are not quantized down to | 4bits yet. | brucethemoose2 wrote: | Quantized versions will pop up on huggingface very soon, if | they arent already there. It takes basically no time, much | less than something like a alpaca finetune. | nwoli wrote: | I love stability ai so much | enono wrote: | [flagged] | paxys wrote: | OpenAI really needs to donate its name to Stability. | Filligree wrote: | The 7B model just barely fits in a 3090, but I've yet to get | useful output; it usually gets stuck in a repetition loop. | | Spent some time playing with sampling parameters, to no effect. | Has anyone else had a go? | cheald wrote: | If you're getting stuck in a sampling loop, try raising the | temperature (0.6+) and/or increasing the repetition penalty | (1.1-1.2 often work well). | nlightcho wrote: | Make sure you are using the instruction tuned model. The base | model will be difficult to prompt. | | It works in 8-bit with about 12GB of VRAM usage. Here's sample | code: | | https://gist.github.com/AlexanderDzhoganov/a1d1ebdb018e2e573... | dang wrote: | https://github.com/Stability-AI/StableLM | [deleted] | quickthrower2 wrote: | I found it a bit difficult to get answers from: | | Q. Can you recommend a good vegan restaurant in Sydney, near the | water | | A. I'm not a vegan, so I don't have a favorite restaurant. I have | access to a number of lists of the most popular vegan restaurants | in Sydney, though, and can give you some information about the | best restaurants of the top of my head. Please let me know if | there's anything else I can do to assist. | | Q. Ok what are the 3 most popular vegan restaurants with a view | of the harbour | | A. The most popular vegan restaurants near the water are: | | Q. Are what? | | A. The most popular vegan restaurants near the water are | | Q. Name a vegan restaurant | | A. | | 1. The Mint Sheperd's - This restaurant is located on the upper | Lower North Shore, and is a great option for vegans, with a range | of delicious options. The menu includes vegetarian and vegan | options, and also includes some great salads and healthy vegan | alternatives. 2. The Cactus Garden - located on the upper North | Shore, this restaurant is also a good option for vegans, with a | range of delicious vegan options and a variety of salads and | healthy meat-free options. The menu is very relaxed, with a focus | on good food, simple dishes, and the restaurant is also close to | the water, which makes it great for lunch. 3. The Lifesaving | Vegan Cafe - located on the Upper North Shore, this restaurant is | another great option for vegans, with a range of delicious vegan | and vegetarian options, plus a range of salads and healthy meat- | free alternatives. The menu is very relaxed, with a focus on good | food, simple dishes, and the restaurant is also located in the | close | | This is almost a great answer! Just that these restaurants don't | exist. | | I had 100x better experience with | https://news.ycombinator.com/item?id=35583349, running locally! | mousepad12 wrote: | [dead] | la64710 wrote: | Thanks for breaking ground here and being truly OSS | convery wrote: | Just tested the 7B-tuned version on Huggingface and it's not | great.. > What features were added in C++ 98? | < C++98 (also known as C++11) was a major development milestone | for C++ < new gcc 5.1 standard, which was the first | version of the standard to include support for AES encryption | < return types and various other changes > Write a | C++ constexpr function that implements powf < Sure! Here | is an implementation of a C++11 constexpr function for the | function pow, using C++11 recursion and variadic templates: | < void pow (int n) { std::cout << "Powerset of " << n << " is " | << pow (n, n) << std::endl; } < This implementation of | the pow function should be possible for any constant integer | power of 2 without exception. | brucethemoose2 wrote: | Maybe you are not following the right syntax? LLaMA Vicuna and | Alpaca suffer if you dont format the query jusy right. | [deleted] | josephwegner wrote: | Anyone have a sense for system requirements on running this | locally? StableDiffusion was too heavy for either of my machines | (which are admittedly not very powerful) - I'm not seeing much in | the docs to indicate whether or not this is more or less | intensive to run locally. | ranguna wrote: | The tuned 7B model is around 33 GBs, so you'll need a PC with | that much VRAM or RAM. I haven't tried to load it on text | generation ui though. | wsgeorge wrote: | If you can run any models on llama.cpp, that might be a good | indicator of which StableLM models you'll be able to run. | | I easily ran 7B int 4 ggml models on an MBP with 16gig RAM. | Same works on a MBA with 8 gig RAM, but you'll have to not run | any other memory-hogging app. | tucnak wrote: | On the off-note, can anybody tell me what's going on with | embeddings, & vector databases? Certainly it would seem that | forward-pass completion is pretty much solved, & a smaller, | better model will appear eventually. Let's say you even managed | to solve both complete() and embed() but what do you do with it, | how are you going to organise, query, and multiply this dataset? | Now the question I know that text-embedding-ada-002 has twice as | many dimensions as mainstream Sentence transformers. Do we need | all the extra dimensions? If not, how do I make it work better | for my specific dataset with lots of jargon and abbreviations and | stuff like that? What are the hardware requirements for that? | I.e. could I do a fine-tuning job on some specific jargon-heavy | text to get better embeddings for them? For one, the more I look | into similarity-based use-cases the more I see that it's not | normally speaking "top-percentile nearest-neightbour search" but | the data is also terribly relational, i.e. it's probably like a | slowly changing dimension, and there's a tree traversal type | structure in how documents are generated as output from other | documents as inputs? So you kind of have to think about these | complete/embed ops both in aggregate; for batching but also in | particular, from the cost/reward ROI type calculation. Not just | in aggregate but also in terms of memory usage patterns to | further optimise layout-- tiering and stuff like that really | comes to light. | | Also: vector database shilling on HN is getting out of hand; | multiple companies literally plugging every mention on the radar, | some actively begging for upvotes. Looking at it all makes you | really appreciate pgvector[1] to a point where you would be more | willing to buy 3.2 TB of high-bandwidth NVMe and dedicate it to a | large IFV index than ever have to deal with all of this "purpose- | built vector database" bullshit. | | [1]: https://github.com/pgvector/pgvector | visarga wrote: | > Do we need all the extra dimensions? | | This discussion seems relevant: | https://www.reddit.com/r/MachineLearning/comments/12q8rp1/di... | Der_Einzige wrote: | Yes, you need all of the dimensions. All of the dimensionality | reduction techniques, including SOTA ones (UMAP or better) are | going to massively harm your embeddings. | tucnak wrote: | Perhaps I didn't word by question correctly, I'm looking to | compare capability of Sentence transformers vs. OpenAI Ada- | based embeddings relative to their respective dimensionality? | lxe wrote: | Finetuning and training models of this size from scratch is now | relatively inexpensive and accessible to anyone, especially with | all these datasets becoming freely available. This is just the | beginning of smaller LLMs that anyone can finetune to achieve | high quality performance on particular tasks. | antimatter15 wrote: | Looks like Nolano.org's "cformers" includes a fork of | llama.cpp/ggml by HCBlackFox that supports the GPT-NeoX | architecture that powers EleutherAI's Pythia family of open LLMs | (which also powers Databrick's new Dolly 2.0), as well as | StabilityAI's new StableLM. | | I quantized the weights to 4-bit and uploaded it to HuggingFace: | https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alp... | | Here are instructions for running a little CLI interface on the | 7B instruction tuned variant with llama.cpp-style quantized CPU | inference. pip install transformers wget | git clone https://github.com/antimatter15/cformers.git cd | cformers/cformers/cpp && make && cd .. python chat.py -m | stability | | That said, I'm getting pretty poor performance out of the | instruction tuned variant of this model. Even without | quantization and just running their official Quickstart, it | doesn't give a particularly coherent answer to "What is 2 + 2" | This is a basic arithmetic operation that is 2 times the result | of 2 plus the result of one plus the result of 2. In other words, | 2 + 2 is equal to 2 + (2 x 2) + 1 + (2 x 1). | jmatthews wrote: | The AI war has officially begun. | mach1ne wrote: | No, that comes when China catches up in the quest for | superintelligence. | xyzzy4747 wrote: | I don't think China has enough skilled AI workers to compete | in any quick timespan. Top talent doesn't want to live and | work there generally. | sanxiyn wrote: | China is demonstrably capable of training GLM-130B. Skills | involved seem comparable to Stability AI. | | https://keg.cs.tsinghua.edu.cn/glm-130b/ | [deleted] | kriro wrote: | I'm curious why you'd think that. China as a country has | many people to start. Some percentage of these people will | end up in AI. Assuming people from all countries are | roughly equally intelligent, the numbers clearly favor | China. Universities over there are quite good, there's a | pretty strong "work hard" mentality I see from all our | Chinese students. Plenty of Chinese graduating or starting | university these days during the AI hype peak. China as a | country isn't sleeping on AI either. I think China as an AI | hub looks quite promising. Anecdotally, China also retains | quite a lot of talent or people go abroad to study and | return to China. Compared to some European countries or | India that "leak" a lot of talent to the U.S. I think China | is quite a bit more stable. | | On the hardware side, things tend to be produced there as | well. | snordgren wrote: | China definitely "leaks" a lot of talent to American | companies - most AI papers that I've seen from respected | Western universities include at least one Chinese name. | | One challenge for China has been the university | enrollment rate. While in Western countries half of each | cohort has been going to university for decades, China is | not there yet. In 2019, just 17% of Chinese adults have | degrees compared to 44% in the US. | | So the large Chinese population is offset by its relative | lack of access to education, while the US can draw from | its own highly educated population in addition to | attracting the best and the brightest from the rest of | the world, including China. | HybridCurve wrote: | There are plenty of authoritarians in the US to conduct | warfare against our institutions that are up to speed. We | don't need to wait for anyone else overseas to get this party | started. | ChatGTP wrote: | Stop worrying about China...wow | rafark wrote: | I'm more than ready. It's exciting to see what can come out of | this. | andy_xor_andrew wrote: | One question I have is- can we expect GPTQ quantization down to 4 | bits (or lower) for these models, the same as for llama (which is | partly why llama was so amazing?) | | I'm sure the answer is "yes", but I wonder if there's anything | that can be done that could make these models more "friendly" to | quantization like this from the very beginning. | | It's absolutely incredible that you can cut the size of a model | in half... and then cut the size of the model in half *again*... | and it fits on your consumer GPU with only a small cut in | accuracy. | brianjking wrote: | What is the actual LICENSE? | | * HuggingFace shows CC-by-NC | https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b | | * Github is Apache 2.0 | | https://github.com/Stability-AI/StableLM/blob/main/LICENSE | Taek wrote: | NC is just for the fine tuned model | GistNoesis wrote: | I'm worried about the version hell of relying on HuggingFace's | transformers. | | In the model card : config.json [1] and generation_config.json | there are strings "transformers_version": "4.28.1" and it refers | to some common architecture "GPTNeoXForCausalLM" . | | Which if I follow the string soft link correctly means, it refers | to this file [2] with a long history of commits including some | recents [3]. | | I don't understand how to get out of the version hell. I can | install and pin a version of the transformer library, but sooner | or later if I chain different foundation models I'll have to do | this for each different model card that has its own version. | | And then those transformer library version are probably dependent | on some pytorch versions. | | I don't understand how you can't reliably track a bug, or a | change in behavior, or guarantee that the model that you spend | millions of dollar training doesn't rust, is reproducible or | become unusable due to this mess. | | (And also the weights are de-serialized pickled python dictionary | which is from a safety point of view equivalent to running | untrusted binaries and very worrying on its own.) | | [1]https://huggingface.co/stabilityai/stablelm-tuned- | alpha-7b/b... [2]https://github.com/huggingface/transformers/blob | /v4.28.1/src... [3]https://github.com/huggingface/transformers/co | mmits/v4.28.1/... | lhl wrote: | I've been diving in lately and while it's not efficient, the | only way to do manage is to create a new conda/mamba | environment, or a custom Docker image for all the conflicting | packages. | | For safety and speed, you should prefer the safetensor format: | https://huggingface.co/docs/safetensors/speed | | If you know what you are doing you can do your own conversions: | https://github.com/huggingface/safetensors or for safety, | https://huggingface.co/spaces/diffusers/convert | brucethemoose2 wrote: | > And then those transformer library version are probably | dependent on some pytorch versions. | | They are not, and I dont think the model even cares about the | transformers version. I run git transformers/diffusers and | PyTorch 2.1 in all sorts of old repos, and if it doesnt | immediately work, usually theres just small changes to APIs | here and there that make scripts unhappy, and that you can | manually fix. | jxy wrote: | In terms of pickled files, we may ask them to release the | models in safetensors. That would be a big improvement. | | There's not much we can do about dependencies on pytorch or | other python libraries. Perhaps people can make more | independent implementations. The redundancy in implementations | would help. | substation13 wrote: | I'm surprised that some of the cutting edge AI work is not | being done in pure Rust. Why even have a binding layer? | ElFitz wrote: | Python seems to have won over the Data / ML world. | | So I'd wager they use what they and their intended audience | know. | tough wrote: | Someone implemented llama on Go recently, | https://github.com/gotzmann/llama.go | | Wouldn't discard a rust implementation of some LLM | architecture at some point | Philpax wrote: | Hi! I'm a maintainer of | https://github.com/rustformers/llama-rs. We're planning | to expand our model support soon. | sdenton4 wrote: | The pickles drive me crazy. | | Tensorflow saved models are a great way to solve the | problem... Save the computation graph and weights, and drop | all the crusty code dependencies. I think ONNX models are | similar. I expect there should be a Jax equivalent at some | point, as Jax is basically perfectly designed for this | (everything is expressed in lax operations, which allows | changing implementations for cpu/gpu/tpu freely... So just | save the list of lax ops). | seydor wrote: | When will we see gglm-versions of these? | | Good job on openAI to sell out in 2022. It was truly the end of | the line. | | No matter how bad these model releases are , they are certain to | get awesome soon with everybody hacking around them. The | surprising success of MiniGpt4 with images shows that openAI's | GPTs don't have some magic secret sauce that we dont know of. | sebzim4500 wrote: | GPT-4 is still much, much better than any of the competition. | | I guess we'll see once we have a 175B version of StableLM | though, presumably that will at least easily beat GPT-3. | franzypants wrote: | So, it can write very coherently but not sure it excels at coding | tasks. | | For example: | | User: Write Python code for evaluating the integral of f(x)=x^2. | | Assistant: Sure, here's an example of Python code to evaluate the | integral of a function `f(x) = x^2`: ``` | | # Define the function | | def integral_of_f(x): return x**2 | | # Evaluate the integral | | result = integral_of_f(3) | | print("The integral of f(x) = x^2 is:", result) ``` | | In this code, we define the function `integral_of_f` as a | function that takes a single argument `x` and returns the | integral of `x^2`. We then evaluate the integral by calling the | `integral_of_f` function and passing in the value of `x` as an | argument. | vikp wrote: | It's fantastic that more orgs are releasing open-source models | trained on more than 300B or so tokens. Here's my take from the | details I could find. | | Pros - 4096 context width (vs 2048 for llama, | gpt-j, etc) - 3B to 65B released or in progress - RL | tuned models available - Trained on more tokens than | existing non-llama models - 128 head dim, so can use flash | attention (unlike GPT-J) | | Cons - No benchmarks released, or details about | the model - Somewhat restrictive license on the base | models, and NC license on the RL models - Small models only | trained on 800B tokens, compared to 1T for llama-7B, and | potentially more for other upcoming alternatives (RedPajama, | etc). I'd like to see their loss curves to see why they chose | 800B. | | High-level, this is likely to be more accurate than existing non- | llama open source models. It's hard to say without benchmarks | (but benchmarks have been gamed by training on benchmark data, so | really it's just hard to say). | | Some upcoming models in the next few weeks may be more accurate | than this, and have less restrictive licenses. But this is a | really good option nonetheless. | HarHarVeryFunny wrote: | They mention 1.5T training tokens, perhaps for the largest | model only ? | vikp wrote: | It's unclear which models will be trained to 1.5T tokens. The | details of how many tokens each model saw in training are on | Github - https://github.com/stability-AI/stableLM/ . But only | for the ones that have been released. | thunderbird120 wrote: | I just asked a stability employee and they said the the | current models ran into an overfitting issue probably due | to some duplicated data somewhere in their dataset, which | consists of 1.5T tokens. The 800B tokens is the number of | tokens they've been trained on so far. The plan is to keep | going and train on the rest of the data once the issue is | resolved. | HarHarVeryFunny wrote: | I've asked this question in a few places, and never been | able to get an answer, maybe you know... | | Q: Why are these LLMs trained on a single epoch, and | perform worse if the dataset is repeated ? | | This seems maybe related to suspecting data duplication | as a cause of overfitting. | | Why don't LLMs need multi-epoch training at a low | learning rate to generalize? If they are managing to | learn from a single epoch, that sounds more like they may | be memorizing! | sebzim4500 wrote: | >- No [...] details about the model | | You can see the model architecture here | | https://github.com/Stability-AI/StableLM/blob/main/configs/s... | lhl wrote: | FYI, I'm running lm-eval now w/ the tests Bellard uses | (lambada_standard, hellaswag, winogrande, piqa,coqa) on the | biggest 7B an 40GB A100 atm (non-quantized version, requires | 31.4GB) so will be directly comparable to what various LLaMAs | look like: https://bellard.org/ts_server/ | | (UPDATE: run took 1:36 to complete run, but failed at the end | with a TypeError, so will need to poke and rerun). | | I'll place results in my spreadsheet (which also has my text- | davinci-003 results): | https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp... | guywithabowtie wrote: | Do you also have results of GPT4 somewhere? or text- | davinci-003-turbo | lhl wrote: | I'm still on the waitlist for GPT-4 API access. Note, that | text-davinci-003 cost about $90 to benchmark at $0.02/1K | tokens, so if you're able to use a GPT-4 model (for | completion and not just instruction) that'll probably be | $270-$540 in credits to benchmark... | lunixbochs wrote: | Are you using https://github.com/EleutherAI/lm-evaluation- | harness? | lhl wrote: | Yeah, although looks like it currently has some issues with | coqa: https://github.com/EleutherAI/lm-evaluation- | harness/issues/2... | | There's also the bigscience fork, but I ran into even more | problems (although I didn't try too hard) | https://github.com/bigscience-workshop/lm-evaluation- | harness | | And there's https://github.com/EleutherAI/lm-eval2/ (not | sure if it's just starting over w/ a new repo or what?) but | it has limited tests available | sebzim4500 wrote: | How possible is it that every other model suffers from | dataset contamination and this model is being unfairly | penalized for having properly sanitized training data? | swyx wrote: | > 128 head dim, so can use flash attention (unlike GPT-J) | | mind explaining why this is so attractive/what the hurdle is | for the laypeople in the audience? (me) | GaggiX wrote: | Standard attention has memory quadratic in sequence length, | whereas FlashAttention has memory linear in sequence length. | Also FalshAttention is faster. | WithinReason wrote: | According to the paper Flash Attention also needs quadratic | memory: | | Let N be the sequence length, d be the head dimension, and | M be size of SRAM with d <= M <= Nd. Standard attention | (Algorithm 0) requires Th(Nd+N2) HBM accesses, while | FlashAttention (Algorithm 1) requires Th(N2d2M-1) HBM | accesses. | GaggiX wrote: | https://github.com/HazyResearch/flash-attention#memory | | "standard attention has memory quadratic in sequence | length, whereas FlashAttention has memory linear in | sequence length." | | I guess you have just reported how many times the layer | will need to access the memory, not how much memory usage | scales with sequence length. | sroussey wrote: | So there must be a downside to FlashAttention. What is it? | lhl wrote: | https://arxiv.org/abs/2205.14135 - Section 5 suggests | that the biggest limitation is that custom CUDA kernels | need to be coded on a per-GPU architecture basis. | kiraaa wrote: | installing it is a nightmare | fpgaminer wrote: | FlashAttention is mathematically identical to standard | attention, so in theory there's no downside. In practice, | numerical inaccuracies of floating point mean that the | results differ slightly. I don't know of any papers going | in depth to analyze what impact those variances have in a | range of real models, but generally speaking deep models | handle slightly variances well. I've not noticed any | difference in my applications training models. And tons | of people use FlashAttention as a drop-in replacement on | models trained on standard attention (e.g. using xformers | in StableDiffusion). | | Also in practice FlashAttention is still relatively new | so it isn't well supported in libraries yet. Until | PyTorch 2.0 you had to either implement it yourself, or | use something like xformers which comes with a bag of | caveats. PyTorch 2.0 now has it built-in, and it's easy | to use, but the implementation is incomplete so you | can't, for example, use it with an attention mask (which | is needed in LLMs, for example). | | tl;dr: Basically none, but it just isn't well supported | yet. | whimsicalism wrote: | > Small models only trained on 800B tokens, compared to 1T for | llama-7B | | LLaMA is trained far beyond chinchilla optimality, so this is | not as surprising to me. | anentropic wrote: | According to this LLaMA still didn't go far enough: | https://www.harmdevries.com/post/model-size-vs-compute- | overh... | whimsicalism wrote: | Yep, it depends on what your goal is. | dragonwriter wrote: | But Chinchilla optimality, while an interesting result, is a | strange target for most practical purposes. Training is one | time, inference is many times; not training past the point | where its cheaper to training a larger model for the same | (proxy for) quality discounts to zero the import of the cost | of inference. | whimsicalism wrote: | Yep, but if stability has the goal of training the best | possible model then that would explain the choices they | made. | GaggiX wrote: | I mean 800B tokens on a 3B model and 7B model is still | way beyond the Chinchilla scale. | burtonator wrote: | Were you able to figure out if the RL models are going to be | jailed? A 65B parameter model could be a bit frightening. | That's 1/3rd the size of GPT3. | sebzim4500 wrote: | I'm sure there will be a bunch of different RL tuned versions | of them, RLHF isn't that expensive. IIRC Microsoft has | software that will do it for a few thousand dollars for a | model that size. I'm sure someone will release a non- | lobotomized version, maybe OpenAssistant. | kiraaa wrote: | its not alway about the size, but yeah its really good! | beecafe wrote: | [dead] | DustinBrett wrote: | I'm wondering what the sweet spot for parameters will be. Right | now it feels like the Mhz race we had back in the CPU days, but | 20 years later I am still using a 2-3GHz CPU. | lhl wrote: | I think "sweet spot" is going to depend on your task, but | here's a good recent paper that may give you some more | context on thinking about training and model sizes: | https://www.harmdevries.com/post/model-size-vs-compute- | overh... | | There have also been quite a few developments on sparsity | lately. Here's a technique SparseGPT which suggests that you | can prune 50% of parameters with almost no loss in | performance for example: https://arxiv.org/abs/2301.00774 | version_five wrote: | I was wondering if the longer training thing was a similar | phenomenon to the double-descent we see in other deep | learning models. Training for a really long time can | improve generalization (as can adding more parameters) - | but I don't know enough about LLM architecture to know if | that's relevant here. My skim of the blog post led me to | think it's proposing a different mechanism (scaling laws). | capableweb wrote: | > - 3B to 65B released or in progress | | Seems they want to do 3B to 175B, although 175B is not in | progress yet. | ipsum2 wrote: | It's not efficient to do 175B. Training a smaller model (65B) | on more data gives better performance for the same compute. | tempaccount420 wrote: | If you want it to just regurgitate training data, sure. But | more parameters will always be better for more complex | tasks. | thewataccount wrote: | > But more parameters will always be better for more | complex tasks. | | I think you should checkout this paper which discusses | the relationship of performance and the ratio of training | tokens to parameter count. | | https://arxiv.org/abs/2203.15556 | tempaccount420 wrote: | StableLM already has an optimal parameter number to | tokens ratio, so what's your point? They should train the | 65B model on even more tokens? | | > StableLM is trained on a new experimental dataset built | on The Pile, but three times larger with 1.5 trillion | tokens of content | thewataccount wrote: | If I understand correctly, based on their prediction in | Table 3 on page 8, they do have enough tokens, but they | also need over a magnitude more compute time. | | > It's not efficient to do 175B. Training a smaller model | (65B) on more data gives better performance for the same | compute. | | This is OP's comment you replied to - so I was responding | under OP's context that the amount of compute time would | be the same, which I apologize I didn't make clear, and | my response was very poorly worded. | | My intent was to link the paper because I think it | supports OP's statement that for the same amount of | compute time and a token ratio, the performance of a | smaller model will be better then a larger one (assuming | they haven't converged yet which they haven't at this | size). | | > If you want it to just regurgitate training data, sure. | | This paper was about showing Chinchilla performing with | models many times larger then itself, showing you don't | need to have a 175B size model for more performance then | "regurgitating training data" | sebzim4500 wrote: | Depends on your compute budget. | kiraaa wrote: | and also easy to deploy | GaggiX wrote: | >Small models only trained on 800B tokens | | "These models will be trained on up to 1.5 trillion tokens." on | the Github repo. | | https://github.com/stability-AI/stableLM/#stablelm-alpha | youssefabdelm wrote: | That's great news, but one would think that since they're | behind Stable Diffusion, that they'd use the insights behind | it and scale data even more than that to result in better | quality at a smaller scale model that can run on most | people's machines. | | Like... try 10 trillion or 100 trillion tokens (although that | may be absurd, I never did the calculation), and a long | context on a 7B parameter model then see if that gets you | better results than a 30 or 65B parameter on 1.5 trillion | tokens. | | A lot of these open source projects just seem to be trying to | follow and (poorly) reproduce OpenAI's breakthroughs instead | of trying to surpass them. | GaggiX wrote: | >try 10 trillion or 100 trillion tokens | | Computation is not free and data is not infinite. | youssefabdelm wrote: | You could've said the same to OpenAI when they were | scaling GPT from 1 billion to 175 billion parameters. | We're all grateful they didn't follow that line of | thought. | | But Stability does have access to a pretty big cluster, | so it's not paying cloud compute (I assume), so cost will | be less, and data of course is not infinite...never | stated that. | | But considering 3.7 million videos are uploaded to | youtube everyday, 2 million scientific articles published | every year, yada yada...that argument falls apart. | | At the very least implement spiral development... 1 | trillion... 3 trillion... (oh it seems to be getting WAY | better! There seems to be a STEP CHANGE!)... 5 | trillion... (holy shit this really works, lets keep | going) | dragonwriter wrote: | The training corpus is the problem. An extra trillion | tokens is (ballpark) an extra million KJV bibles worth of | text formatted for ingestion. And you probably picked all | of the low hanging fruit in terms of quality prior | vetting and being in a standard format for ingestion in | your first trillion tokens of training data. | dragonwriter wrote: | > Like... try 10 trillion or 100 trillion tokens (although | that may be absurd, I never did the calculation) | | But where's the corpus supposed ro come from? | Taek wrote: | Devs confirmed that the small ones use 800B, 1.5T is for the | large ones | GaggiX wrote: | @thunderbird120 asked a Stability employee and say that the | plan is going to keep training the models up to 1.5T. So I | don't know where do you read this. | nickthegreek wrote: | https://github.com/Stability-AI/StableLM#stablelm-alpha | shows that the 3b and 7B had 800b training tokens. | Taek wrote: | That may be, but the weights you can download today were | trained on 800B | sroussey wrote: | I think they are "checkpoint" models in this case. | | Will be fun to compare when completed! | GaggiX wrote: | yes of course that's why they use "will be trained" on | the GH repo. | IronWolve wrote: | Are the XL 1024x1024 enhanced models released for stable | diffusion yet? That looks really good from the testing. | nologic01 wrote: | Is there something like Moore's law for LLM's that will | eventually turn them into ubiquitus compute? | matthasfun wrote: | I've recently downloading diffusion bee to my Mac for image | generation with stable diffusion. | | Excited for when I can do the same with the StableLM. | beecafe wrote: | [dead] | transcriptase wrote: | Missed opportunity for Stability to call themselves "ClosedAI" | then being the opposite of what the name suggests. | leaving wrote: | Well that was a trip. | | I asked StableLM-Tuned-Alpha-7b to translate the Swedish phrase | "Skaplig bogpulka" into English (because ChatGPT and New Bing | refuse to do so as a result of their puritannical US censorship) | and it confabulated wildly. | | It made up an Estonian folk song and somehow we ended up on the | etymology of the well-known Swedish word "Skeletor". | | This one is not ready for prime-time, but I have hopes. Someone | please make a model that doesn't censor. I won't be paying one | thin dime for this stuff until it is censorship-free. | [deleted] | arpowers wrote: | Why does every LLM use "Hope this email finds you well" to start | virtually every letter and email? | ShamelessC wrote: | Have you tried something like: | | "Never start an email with 'Hope this email finds you well'" | | in your first prompt. | cmsj wrote: | For people wanting to run it locally, you can fit the 7b model | (just) into a 24GB VRAM GPU (e.g. 3090/4090). The 3b model | appears to be much more reasonable, but I would say the output | is.... of limited quality based on the few tests I've run thus | far. | cmsj wrote: | Gist that mostly just takes the notebook Stability AI have in | the github repo, and turns it into a script you can run locally | after installing a few dependencies from pip: | | https://gist.github.com/cmsj/2d6b247ad4fc8f15011105feeda763e... | millimeterman wrote: | I suspect the community will start creating lower | precision/quantized versions of the model very quickly. LLaMa | 30b quantized to 4 bits is runnable on a 3090/4090. | janmo wrote: | Don't need a GPU to run the model, you can use your RAM and | CPU, but it might be a bit slow | cmsj wrote: | It's very slow, and for the 7b model you're still looking at | a pretty hefty RAM hit whether it's CPU or GPU. The model | download is something like 40GB. | nwoli wrote: | No way, I thought this was months away still. So excited | MacsHeadroom wrote: | Completed training is. These are early "Alpha" checkpoints. | visarga wrote: | It's released but still cooking in the oven as we speak. | msoad wrote: | I feel "The Pile" needs a lot more to match what OpenAI has | trained their models on. The mysterious "Books2" and so on. | nickthegreek wrote: | Great to see Stability release this with a great license as well. | Any idea on the usecases for the 3B model? Will a model that | small suffer heavily from a 4bit quantization? | wsgeorge wrote: | > Will a model that small suffer heavily from a 4bit | quantization? | | From my experience with quantized 7B llama models, avoid 3B if | you can. Without benchmarks, I think this is a decent rule of | thumb. | capableweb wrote: | Generating either a 0 or 1 in response to something? Maybe | useful in some testing scenarios? Honestly don't know either. | New lorem ipsum? | visarga wrote: | You can use it as the assistant model to a large model, it's | called speculative sampling. You generate text with the small | model and validate with the large one, ensuring no deviation | occurs. Speedup of 2.5x | | https://arxiv.org/abs/2302.01318 (DeepMind) | MacsHeadroom wrote: | That would be 1bit quantization. In reality quantization | under 8bits is done in smart ways which result in higher | effective output quantization and lower effective memory size | quantization. | | For example, bucketing identical groups of weights and then | reusing one bucket for all the identical groups lowers the | effective bit quantization at the memory level while | retaining the uncompressed quantization quality. | | There is literature on effective quantization levels below 1 | ("So called 0bit quantization). But even then the actual | weights are typically 2-4 bits and there is just a lot of | reuse of weights going on. | | Another neat trick is to line bins of weights up in order of | lowest to highest weights and compute a function to produce a | zero offset throughout the bins of weights; such that 0010 in | one bin and and 0010 in another bin are unsampled to 16bit | and then have different offsets added, maintaining | uncompressed 16bit performance without the memory overhead. | | There are many more tricks like this and many still to be | found! | jerrygoyal wrote: | what's the cut-off knowledge for this? | alexb_ wrote: | Absolutely a giant fan of Stability staying to actually open | source licenses and not licenses that impose restrictions on what | you can use it for. This is the future of AI! Beware of any org | that uses "ethical" licenses - they are not open source. | Stability is one of the few organizations that actually cares | about free software, you love to see it. | cjbprime wrote: | > These fine-tuned models are intended for research use only | and are released under a noncommercial CC BY-NC-SA 4.0 license, | in-line with Stanford's Alpaca license. | | This is a no-commercial-use-allowed license; it is neither | considered free software nor open source, the definitions of | which disallow restrictions on what you can use the work for. | renewiltord wrote: | Interesting. A non-LLM hallucinating. And to think we used to | believe that was only a property of LLMs. | circuit10 wrote: | Huh? Everything they said is true, isn't it? | renewiltord wrote: | StableLM, the model that's theirs is CC-BY-SA 4.0. | [deleted] | freedomben wrote: | The two sentences prior are important: | | > _We are also releasing a set of research models that are | instruction fine-tuned. Initially, these fine-tuned models | will use a combination of five recent open-source datasets | for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, | and HH. These fine-tuned models are intended for research use | only and are released under a noncommercial CC BY-NC-SA 4.0 | license, in-line with Stanford's Alpaca license._ | | The snippet you quoted is not talking about the main model in | the announcement. It's talking about fine-tuned models based | on other models. Stability has to respect the license of the | originals. They cannot change it. | | The main model is described higher up in the post and _is_ | permissible for commercial: | | > _Developers can freely inspect, use, and adapt our StableLM | base models for commercial or research purposes, subject to | the terms of the CC BY-SA-4.0 license_ | btown wrote: | It also appears that CC BY-SA-4.0 is GPL-compatible. Not a | lawyer and this is not legal advice, but it certainly seems | like one could operate their own StableLM server/service | and allow proprietary code to use it over a network | interface, much like one could use a GPL-licensed database | system. | | https://creativecommons.org/faq/#can-i-apply-a-creative- | comm... | seydor wrote: | Not their fault, the instruct-tuned models depend on non-open | data.... Which should be open however. Scraping chatGpt is | legal | sanxiyn wrote: | Agreed. Scraping ChatGPT is against OpenAI terms of use and | OpenAI is entitled to terminate your access immediately | upon notice, but since ChatGPT output is not copyrighted | (and copyrightable), output you acquired before termination | should be freely redistributable. I am not sure why | Stanford Alpaca authors think otherwise but they are wrong. | | https://openai.com/policies/terms-of-use | | Thank you for using OpenAI! These Terms of Use apply when | you use the services of OpenAI, L.L.C. (snip) By using our | Services, you agree to these Terms. (snip) You may not | (iii) use output from the Services to develop models that | compete with OpenAI. (snip) We may terminate these Terms | immediately upon notice to you if you materially breach | Sections 2 (Usage Requirements). | whywhywhywhy wrote: | "Ethics" will only ever be an excuse to lock this technology | behind one companies paywall. The only ethical AI is actually | free and open AI, how its trained is irrelevant imho as long as | we can all benefit. The negatives of the work of individuals | being used to train it outweigh the negatives of one company | just doing that and holding the power within their walls. | version_five wrote: | Yeah I wish there was more real investigation / analysis into | who is behind various "ethical AI" pushes and what they stand | to gain from it. From what I can see, many of the people | involved either are invested in companies that will somehow | certify your AI is ethical, or just want to stifle | competition so they can catch up. Of course there's also a | sprinkling of "current thing" supporters. | Xelynega wrote: | I have to disagree. Especially in the case of LLMs where new | API services are popping up all over the place, an "ethical" | license like agpl that requires the source be shared for web | services would would accelerate development of the space as a | whole immensely. | EamonnMR wrote: | I think when they say ethical they're talking about RAIL | which has clauses about not doing bad guy stuff with their | models, not AGPL. | shostack wrote: | How is this sort of thing audited? I imagine there are all | sorts of lifestyle AI businesses that won't give two shits | about a license where people can't easily see or audit what is | being used. | burtonator wrote: | I am definitely not in favor of OSS license for models >= | GPT3... | | Not unless they're aligned well. | | There are all sorts of horrible use cases that these could be | used for. | circuit10 wrote: | For the worst use cases, like scams, why would they respect | the license? Though that probably applies for some legal but | immoral use cases | cheald wrote: | "Alignment" is just a euphemism for "agrees with me", though. | Humans aren't even aligned with each other. Demanding that AI | models be "aligned" is essentially a demand that AI only be | produced which agrees with your priors. | risho wrote: | it is true that there are concerns relating to open source | and ai, but surely the having them be closed off, manipulated | and controlled untrustworthy corporations is worse. | wahnfrieden wrote: | they like the particular techbro that runs the corporation | currently. people will find all sorts of ways to lick the | boot of private ownership. | welshwelsh wrote: | The ideal alignment for a computer program is to be aligned | completely with the user, prioritizing their needs to the | exclusion of all others. | orra wrote: | I am very happy to see them use a true FLOSS licence. However, | it's a surprise to me, given Stable Diffusion is proprietary, | using one of those "ethical" licences. | r0l1 wrote: | How will they generate revenue? | earthboundkid wrote: | I tooted this yesterday: | | Selling access to LLMs via remote APIs is the "stage plays on the | radio" stage of technological development. It makes no actual | sense; it's just what the business people are accustomed to. It's | not going to last very long. So much more value will be unlocked | by running them on device. People are going to look back at this | stage and laugh, like paying $5/month to a cellphone carrier for | Snake on a feature phone. | | Web apps: | | - Need data persistence. Distributed databases are really hard to | do. | | - Often have network effects where the size of the network causes | natural monopoly feedback loops. | | None of that applies to LLMs. | | - Making one LLM is hard work and expensive. But once one exists | you can use it to make more relatively cheaply by generating | training data. And fine tuning is more reliable than one shot | learning. | | - Someone has to pay the price of computation power. It's in the | interest of companies to make consumers pay for it up front in | the form of a device. | | - Being local lets you respond faster and with access to more | user contextual data. | s3p wrote: | What does "stage plays on the radio" mean? | crucialfelix wrote: | When radio first started, people read plays written for the | stage, because that's what they knew and what they had. Later | people learned to write for the medium and make radio native | entertainment. | | Same thing happened when TV arrived. They did live versions | of the radio entertainment on a set in front of a camera. | [deleted] | what_ever wrote: | Trying to be fancy to say things won't be running locally. | The orchestra/performers won't be live in front of you but on | the radio. | throwawayadvsec wrote: | I don't think it's going to happen in the next few years | | the prices are gonna drop like hell, but ain't no way we run | models meant to run on 8 nvidia A100 on our smartphones in the | next 5 years | | just like you don't store the entirety of spotify on your | iphone, you're not gonna run any decent LLM on phones any time | soon(and I don't consider any of the small Llamas to be decent) | w4ffl35 wrote: | > but ain't no way we run models meant to run on 8 nvidia | A100 on our smartphones in the next 5 years | | m$ has been working on an AI chip since 2019 so i think we | will. | throwawayadvsec wrote: | An A100 is about the size of a brick, there is no way we're | fitting those 8 bricks in a phone in the next five years, | without even thinking about heat management | asciimike wrote: | An A100 HGX server is ~6kW of power consumption (and | associated heat), while an iPhone is O(1W). I agree that | a 6000x increase in energy density or 6000x decrease in | power consumption is unlikely in this decade. | d33 wrote: | They still can't beat the laws of physics. It's just the | kind of density that doesn't seem feasible to put in a | smartphone. | wewtyflakes wrote: | Human brains seem to be of the same order of magnitude in | terms of size, and seem more nuanced and capable than | GPT4. | what_ever wrote: | And? Nvidia has been working on AI chips for years. | davnicwil wrote: | the only thing I can say to this is that Apple have seemed | laser focused on tuning their silicon for ML crunching, that | that focus is clearly now going to be amped up further still, | and that in tandem the software itself will be tuned to Apple | silicon. | | GPUs on the other hand are pretty general purpose. And 5 | years on a focused superlinear ramp up is a long time, lots | can happen. I am not saying it's 100%, or even 80% likely. | It'll be super impressive if it happens, but I see it as well | within the realms of reason. | brucethemoose2 wrote: | The M2 and the 4090 are both very general purpose. In fact, | the 4090 allocates proportionally more silicon area to the | tensor cores than Apple allocates to the neural engine. | | The M series is basically the only "big" SoC with a | functional, flexible NPU and big GPU right now, which is | why it seems so good at ML. But you can bet actual ML | focused designs are in the pipe. | zamnos wrote: | Well, and gobs of RAM, which only top end cards can | compete with. | tehsauce wrote: | Apple's new M2 Max has a neural engine which can do 15 | trillion flops. Nvidias's A100 chip (released almost 3 | years ago) can do 315 trillion flops. Apple is not going to | close this 20x gap in a few years. | moffkalast wrote: | > this 168x gap | | FTFY, remember it takes 8 of those to even load the | thing. And when the average laptop has that much compute, | GPT 4 will seem like Cleverbot in comparison to the state | of the art. | sroussey wrote: | At some point, they will put the models in silicon. I'm | curious as to when... 5yr? | viraptor wrote: | That doesn't sound likely with the current architectures. | There may be some kind of specialisation, but NN is like | the chip design nightmare. We can't do chips that that | many crossed lines. It's going to have to keep the | storage+execution engine pattern unless we have done | breakthroughs. | | "More specialised than GPU" is the game for now. | davnicwil wrote: | right, it's a huge challenge. | | I think the tuning the models to the hardware piece is | important, and of course there is much more incentive to | do this for Apple than nvidia because of the distribution | and ecosystem advantages Apple have. | | But also, I don't know... let's see what the curve looks | like! It's only been a couple of years of these neural | engines. Let's see how many flops M3 can hit this year. | And then m4 the next. Again, 5 years is a long time | actually when real improvement is happening. I am | optimistic. | refulgentis wrote: | Curious, why do you think that? My knowledge is limited to | marketing material and my M2 vs my 3090, and my conclusion | so far would be that's in every hardware makers marketing | claims the past couple years. | viraptor wrote: | > but ain't no way we run models meant to run on 8 nvidia | A100 on our smartphones in the next 5 years | | When I leaned about neutral networks, the general advice at | the time was "you'll only need one hidden layer, with | somewhere between the number of your input and output | neurons". While that was more than 5 years ago, my point is - | both the approach and the architecture changes over time. I | would not bet on what we won't have in 5 years. | Analog24 wrote: | This is the reason why they're not going to move on device | anytime soon. You can use compression techniques, sure, but | you're not going to get anywhere near the level of | performance of GPT-4 at a size that can fit on most consumer | devices | mkaic wrote: | I think we'll see completely new architectures dominate in | the near future, ousting the transformer. I am strongly | suspicious that, while impressive, transformers use several | orders of magnitude more compute than is "needed" for the | tasks they perform--if for no other reason because the | human brain performs similarly and it only draws 20 watts! | And it isn't even an engineered system, jus the product of | a very, very long history of natural selection! I fully | anticipate that we'll see AI in the near future that | achieves human-level performance on sub-human power budgets | like the ones you'd be constrained by on a phone :) | Analog24 wrote: | "neat future" is very ambiguous. At the moment there is | nothing even close to transformers in terms of | performance. I suspect you are right in general but I'm | not sure about the "near future" part, there needs to be | a pretty significant paradigm shift for that to happen | (which is possible, of course, I just don't see any hints | of it yet). | brucethemoose2 wrote: | Vicuna (LLama) 13B is pretty good IMO. A 20B model can | definitely fit in RAM on future devices. | WhiteNoiz3 wrote: | I agree - I think for security and privacy we need it to be on- | device (either that or there needs to be end to end encryption | with gaurantees that data won't be captured for training). | There are tons of useful applications that require sensitive | personal information (or confidential business information) to | be passed in prompts - that becomes a non issue if you can run | it on device. | | I think there will be a lot of incentive to figure out how to | make these models more efficient. Up until now, there's been no | incentive for the OpenAI's and the Googles of the world to make | the models efficient enough to run on consumer hardware. But | once we have open models and weights there will be tons of | people trying to get them running on consumer hardware. | | I imagine something like an AI specific processor card that | just runs LLMs and costs < $3000 could be a new hardware | category in the next few years (personally I would pay for | that). Or, if apple were to start offering a GPT3.5+ level LLM | built in that runs well on M2 or M3 macs that would be strong | competition and a pretty big blow against the other tech | companies. | ForHackernews wrote: | ? | | All software is sold as SaaS today, because it's more | profitable. The same will be true for LLMs. | coffeebeqn wrote: | LLMs also need persistent storage for best performance. | Otherwise you're always starting at square one | circuit10 wrote: | At least for now, good LLMs need GPUs that cost tens of | thousands, so a cloud API is the only reason option. It | definitely makes a lot of sense | andrewcamel wrote: | On this topic, Apple is the sleeping giant. Sleeping tortoise | maybe. Everyone else has been fast out of the gates, but Apple | has effectively already been positioning to leap frog everyone | after a decade+ of M1 chip design. Ever since these chips | launched, the M1 chips have felt materially underutilized, | particularly their GPU compute. Have to believe something big | is going on behind the scenes here. | | That said, wouldn't be surprised if the truth was somewhere in | between cloud-deployed and locally deployed, particularly on | the way up to the asymptotic tail of the model performance | curve. | ohgodplsno wrote: | M1 GPUs are barely real-world tested, alright chips. They're | far from being a sleeping giant. | smoldesu wrote: | What would a "leap frog" look like, in your mind? I'm | struggling to imagine how they're better positioned than the | competition, especially after llama.cpp showed us that | inference acceleration works with everything from AVX2 to ARM | NEON. Compared to Nvidia (or even Microsoft and ONNX/OpenAI), | Apple is somewhat empty-handed here. They're not out of the | game, but I genuinely see no path for them to dominate | "everyone". | yunwal wrote: | My guess is a leapfrog would have more to do with how LLMs | are integrated into an operating system, rather than just | coming out with a better model. I don't think we're gonna | get a substantially more capable LLM than GPT-4 anytime | soon, but fine-tuning it to sit on top of the core of an | operating system could yield results. | smoldesu wrote: | Feels like Microsoft already beat them to the punch. | Their ONNX toolkit has better ARM optimization than | Apple's own Pytorch patches, and their collaboration with | OpenAI places them pretty far ahead of the research | curve. I'm convinced Microsoft could out-maneuver Apple | on local or remote AI functionality, if they wanted to. | jrm4 wrote: | I think it may be naive that people believe that the deciding | factor on how these things are used is likely to be "chip | speed." or "efficiency on the machine." | | I wish we were in that world; but it more likely seems like it | would be "Which company jumps ahead quickest to get mindshare | on a popular AI related thing, and then is able to ride scale | to dominate the space?" | | REALLY hope I end up being wrong here; the fact that so many | models are already out there does give me some hope. | GavinB wrote: | I'd suspect it will actually accelerate moving everything into | the cloud. | | If your entire business is in the cloud, you can give an AI | access to everything with a single sign or some passwords. If | half is on the cloud and half is local, that's very annoying to | have all in-context for your AI assistant. And there's no way | we're getting everything locally stored again at this point! | bugglebeetle wrote: | Right, this is why StabilityAI is getting in bed with Amazon, | so private, fine-tuned models can operate on all your data | sitting out there in S3 buckets or whatever. | fshbbdssbbgdd wrote: | Most of your bullets sound like arguments that local models | will win. | | > Often have network effects where the size of the network | causes natural monopoly feedback loops. | | This one in particular sounds like an argument that remote | models will win. | max51 wrote: | I don't that's true in the context of businesses because they | won't want their data to be leaked and/or used for other | clients. The more data from your company you can feed the AI, | the more productive it will be for you. I'm not just talking | about semi-public documentation, but also things like emails, | meeting transcript, internal tools APIs, employee details, | etc. | | If the AI service provider uses your data to help better | train their AI, it will be blacklisted by most companies. If | you keep them in silos, the centralisation will offer almost | no benefit while still being a very high privacy risk. The | only benefit they get is that it allows them to demo it and | see it's potential, but no serious business will adopt it | unless you also provide a self-hosted solution. | | I think the only people who will truly benefit from using | cloud services as a long term solution are personal users and | companies too small to afford the initial cost of the | hardware. | goatlover wrote: | That seems hard to believe for businesses which already | rely on Office, Teams and Sharepoint, since Microsoft will | be making its version of ChatGPT available for all its | products, and the integration will be too hard to pass up | on. | max51 wrote: | Microsoft is in a different situation because everyone is | already forced to trust them with their OS and o365. For | better or for worse, there are no current alternatives to | Windows and the office suite for most businesses. If you | already login to your OS with a Microsoft account and | process your data in Excel, adding an AI tool on top of | it is not a big jump. Very few others are in this | situation. | | For every other AI service providers, good fucking luck | getting clients to trust you. I expect we will see a lot | AI services that offer a cheap and easy to use cloud AI | subsidized by a very expensive self-hosted version. I | also expect a lot of data leaks and many high profile | incidents where an AI creates a document or code that | includes sensitive data from someone else (hard coded | passwords, API keys, etc.). | | Even for a large company like Autodesk or Adobe, you | might trust them with your engineering drawings and your | new product design, but would you feel comfortable | uploading your code base for internal tools, employee | files, email communications, etc. to them? It's gonna be | a hard no for a lot of businesses | precompute wrote: | This is mostly why the future of computation only makes sense | monetarily if you have everyone shift to a thin client. So, | banning GPUs is likely considered a "necessary evil" by the | BigTech cognoscenti for accomplishing that goal. | qingdao99 wrote: | > But once one exists you can use it to make more relatively | cheaply by generating training data. | | Is that a real technique? Why not just shrink down the model | itself directly somehow, is that not possible? | vsareto wrote: | I think you'll be paying a cloud provider instead of an API | company, but it'll still be a monthly service charge for the | service | | I don't see small/medium companies getting into acquiring | hardware for AI | losvedir wrote: | This doesn't seem technically feasible to me. The state of the | art will for a long time require a lot more hardware to run | than it's available on a consumer device. | | Beyond which, inference _also_ benefits from parallelization, | not just training, so being able to batch requests is a | benefit, and more likely when access is offered via an API. | FL33TW00D wrote: | This isn't correct. It's 100% feasible to run very capable | models on consumer devices. | | I wrote up a feasibility investigation last year: | https://fleetwood.dev/posts/a-case-for-client-side- | machine-l... | crazygringo wrote: | But there's a big difference between running "very capable" | models that use most available resources and consume tons | of power and run slowly... | | ...versus running the best models available, in a few | seconds, without using up the memory the main app you're | using needs for running. | | These are all mainly going to be run remotely for general | consumer usage for quite a while I think. | FL33TW00D wrote: | How many applications on your machine right now are GPU | accelerated? There is a chunk of silicon that spends | almost all its time doing nothing. | moffkalast wrote: | > very capable models | | Well that's the problem though, those models don't come any | close to being _useful_ at all. At least not yet. And they | also run much slower. | | As compute increases in general, there will be larger and | more capable state of the art models and it'll make more | sense to just use those instead of trying to run some local | one that won't give you any useful answers. Data centers | will always have a few orders of magnitude more horsepower | than your average laptop, even with some kind of inference | accelerator card. | FL33TW00D wrote: | Stable diffusion isn't useful at all? | circuit10 wrote: | "very capable" and "state of the art" aren't the same | thing. In this case the gap is huge | FL33TW00D wrote: | This is true, but a good subset of tasks can be | accomplished by much smaller models. Copilot is 12B! | frozenlettuce wrote: | I ran alpaca 7B in a 2015 Acer laptop (it was slow, but | worked!) | brucethemoose2 wrote: | I can already run Vicuna(llama) 7B on my 2020, 14" PC laptop | at ~3.5 tokens/sec, and more speed can definitely be squeezed | out. | | Most future laptops and phones will ship with NPUs next to | the CPU silicon. Once they get enabled in software, that | means a 16GB machine can run a 13B model, or a 7B model with | room for other heavy apps. | | As for the benefits of batching and centralization, that is | true, but its somewhat countered by the high cost of server | accelerators and the high profit margins of cloud services. | burtonator wrote: | We're stuck here for a while due to the size, and cost, of the | larger models. | | The main reason I want a non-cloud LLM is that I want one | that's unaligned. | | I know I'm not a criminal and I want to stop being reprimanded | by GPT4. | | What I'm most interested here is fine tuning the model with my | own content. | | That could be super valuable especially if we could get it to | fact check itself, which you could with a vector database. | shostack wrote: | What's been so interesting with the explosion of this has | been how prominently the corporately-driven restrictions have | been highlighted in news and such. | | People are getting a good look in very easy to understand | terms at the foundational stage at how limiting the future is | to have this just be another big tech controlled thing. | WhiteNoiz3 wrote: | They have said that the alignment actually hurts the | performance of the models. Plus for creative applications | like video games or novels, you need an unaligned model | otherwise it just produces "helpful" and nice characters. | skybrian wrote: | This is sort of like saying the world wide web is a fad. Many | people made that argument, but a lot of desktop apps got | replaced by websites even though they were supposedly inferior. | | ChatGPT works fine as a website and you don't need to buy a new | computer to run it. You can access your chat history from any | device. For many purposes, the only real downside is the | subscription fee. | | If LLM's become cheaper to run, websites will be cheaper to | run, and there will be lower-cost competition. Maybe even cheap | enough to give away for free and make money from advertising? | zirgs wrote: | The other downside starts with an "As an AI language | model..." | ModernMech wrote: | And ends with "...I'm afraid I can't do that." | lairv wrote: | This doesn't seem that obvious to me, serving LLMs through an | API allows to have highly optimized inference with stuff like | TensorRT and batched inference while you're stuck with batch | size = 1 when processing locally. | | LLMs doesn't even require full real-time inference, there are | applications like VR or camera stuff where you need real-time | <10ms inference, but for any application of LLMs 200-500ms is | more than fine | | For the users, running LLMs locally means more battery usage | and significant RAM usage. The only true advantage is privacy | but this isn't a selling point for most people | pornel wrote: | You're still thinking in terms of what APIs would be used | for, rather than what local computation enables. | | For example, I'd like an AI to read everything I have on | screen, so that I can ask at any time "why is that? Explain!" | without having to copy paste the data and provide the whole | context to a Google-like app. | | But without privacy guarantee (and I mean technical one, not | a pinky promise to be broken when VC funding runs out) | there's no way I'd feed _everything_ into an AI. | brucethemoose2 wrote: | We are very close to optimized ML frameworks on consumer | hardware. | | And TBH most modern devices have way more RAM than they need, | and go to great lengths to just find stuff to do with it. | Hardware companies also very much like the idea of a heavy | consumer applications. | [deleted] | thatwasunusual wrote: | Is this a joke? I tried the HuggingFace demo, and it just spews | out ridiculous replies for everything. | | Examples: | | >> What can you tell me about Therese Johaug? | | > The person you're asking about is Therese Johaug (The Judge) a | Norwegian jazz violinist. She played with many famous musicians, | including and especially Charles Mingus. She was born in 1954 in | the town of Kolsas. She is married to the violinist Mark | Sorensen. | | Everything in that statement is wrong. She is a Norwegian cross- | country skier, born in 1988 in Dalsbygda, not married. | | Ok. Let's try something easier. | | >> What can you tell me about Edvard Munch? | | > -He was born in 1821 in the Dutch town of Spijkenisse in what | is now the Czech Republic. | | > -He studied art at the Academy of Fine Arts in Brussels, | Belgium, where he made a portrait of a young girl called Anna. | | > -He then moved to Norway to practice with his friends and Dano- | Norwegian painter, Christian Krohg, the two of them known as "The | Scream" or "Kristine in the Black". | | > -There he met the painter, Ambroise Thomas, who became a very | close friend of Munch's and would often take him to the beach to | watch the boats. | | > -One day in 1883 Munch bought a house at a deserted place in | the outskirts of Oslo, in the southern region of Norway. | | > -The house, called "La Chute", had no inhabitants at all, it | was just a huge pile of rocks, but this Munch loved it so much | that he named it "Lochte". | | > -He took the name "Munch" because he thought it was German, but | it was totally French, he had the idea that he could paint the | ocean and the sky, so he named his new painting "Moulin Sur | l'Oiseau". | | > -At the same time, he would soon start creating works of art | that would be strange and not make any sense, for instance, one | summer day in 1870 he painted what is known now as "The Scream", | it has many other paintings with the same title, "The Scream" is | a drawing of a young girl, she is holding an ax and is crying in | the scream position. | | > -He had the idea of "The Scream" in one sitting in 1883, but it | took six months to paint. | | > -He stayed in France in the studio of Christian Krohg when he | was still Edouard de Badeley, one of the greatest painters of his | time. | | > -In 1891 he returned to Norway to France and lived there for | the rest of his life. | | > -He died in Paris in 1900. | | Christ on a bike. | | My favourite must be that "The Scream is a drawing of a young | girl, she is holding an ax and is crying in the scream position." | tmikaeld wrote: | Agreed, it doesn't even do the basic stuff that GPT-2 did, how | can anyone be impressed by this? | franzypants wrote: | I get the sense that any accurate portrayal of the model is | getting downvoted. Sure, it's going to be a while before open- | source models catch up with OpenAI. I'm glad that Stability is | working on this, and I look forward to trying out their larger | models that they're training right now, but we should be able | to discuss accurately the capabilities of an LLM without just | automatically jumping on the hype train. | ShamelessC wrote: | That's fair, but this: | | > Is this a joke? I tried the HuggingFace demo, and it just | spews out ridiculous replies for everything. | | Is a shallow and uninformed criticism. The parameter count | alone should inform people that this isn't going to match | GPT-3.5 or GPT-4. Not everyone knows that of course, but | starting a comment with "Is this a joke?" is precisely the | wrong way to start such a discussion. | adrianmonk wrote: | Yeah, I've tried a few things, and so far I've gotten no | answers I'd consider acceptable. Here's are some examples. | | ----- | | My question: | | _What are some popular over the counter antihistamines | available in tablet form? Please name three, and for each one, | give the generic and brand name, and NO OTHER INFORMATION AT | ALL._ | | Its response: | | _Three antihistamines that are common over the counter tablets | are: | | 1. Claritest | | 2. Zyrtec | | 3. Marzolast | | Brand names for these tablets are: | | 1. Claritest - Contains diphenhydramine hydrochloride, which is | an expectorant and cold and allergy relief medicine. | | 2. Zyrtec - Contains diphenhydramine hydrochloride, which is an | expectorant and antispasmodic medication for motion sickness. | | 3. Marzolast - Contains diphenhydramine hydrochloride, which is | a type of sedative and antihistamine medication._ | | Things wrong with this answer: | | * Claritest isn't an antihistamine. It's a non-invasive | prenatal DNA screening test. | (https://www.genpathdiagnostics.com/patients/womens- | health/ge...) | | * Marzolast isn't an antihistamine. It isn't anything. It only | has one meaningful hit on Google, which is a coupon code for a | Udemy course. | | * It says Zyrtec contains diphenhydramine hydrochloride, which | is an antihistamine, but the wrong one. Zyrtec contains | cetirizine. | | * It gives three different, conflicting explanations for what | diphenhydramine is. The last one is right, and the first two | are totally wrong. | | * I asked it (pretty clearly!) to give no other information, | but it gave a lot. | | ----- | | My question: | | _What film made red staplers famous?_ | | Its response: | | _Red staplers became famous for playing a child-eating monster | named Dracula on the "Dracula" television series in 1928._ | | Things wrong with this answer: | | * Television series in 1928?! There basically weren't any | except an experimental one called The Queen's Messenger. | | * Red staplers isn't an actor. There's no playing anybody. | | * I'm pretty sure Dracula doesn't eat children. Wrong type of | monster. | | ----- | | My question: | | _What's the difference between a fork and a spoon?_ | | Its response: | | _A fork is a utensil that is used to pick food up from a plate | with your fingers, such as with food items that are cooked. | Fingers are made of many different parts, including bones, | muscles, and tendons. The two most basic kinds of utensils are | spoons and forks. Forking is typically done by the dining | creature's hands and includes handling both the bread and the | fish._ | | Things wrong with this answer: I'm not even going to bother | this time. | sroussey wrote: | "also fine-tuned the model with Stanford Alpaca's procedure using | a combination of five recent datasets for conversational agents: | Stanford's Alpaca, Nomic-AI's gpt4all, RyokoAI's ShareGPT52K | datasets, Databricks labs' Dolly, and Anthropic's HH. We will be | releasing these models as StableLM-Tuned-Alpha." | | They have released the 3B and 7B of both the base and instruction | tuned models. 30B and 65B in training and released later. | dmm wrote: | Issue to add support for these models to llama.cpp: | https://github.com/ggerganov/llama.cpp/issues/1063 | aketchum wrote: | Are they releasing the weights for download? The links to | HuggingFace in the readme are giving me 404. This dataset they | built on-top of "The Pile" sounds interesting - looking forward | to evaluating their claim that 3-7 billion param models can | perform on par with 175 billion param GPT-3 | artyomxyz wrote: | It looks like it's available here | https://huggingface.co/stabilityai/stablelm-base-alpha-7b/tr... | turingfeel wrote: | Did they claim this? I didn't see that claim made in the above | post. | aketchum wrote: | "The richness of this dataset gives StableLM surprisingly | high performance in conversational and coding tasks, despite | its small size of 3 to 7 billion parameters (by comparison, | GPT-3 has 175 billion parameters)." | | So they did not explicitly say it is comparable, but | implicitly compared the two. I'm curious to evaluate what | "surprisingly high performance" means exactly. | mightytravels wrote: | I love the installation of this! Just open a Python window and it | downloads whatever it needs. Well done! | dr_dshiv wrote: | How can one get started adding RLHF to something like this? | coolspot wrote: | Open-assistant.io | seydor wrote: | Pretty soon there are going to be desktop apps for that | jonplackett wrote: | Anyone know if this will be added to the Stability API? | lxe wrote: | The model is very good. Looks like it's using GPT-NeoX structure, | and the finetuned 7b version is very quick, and beats everything | else in the category (vicuna, oasst, openchatkit, gpt4all, | etc...) in my "write a poem about cheese" benchmarks. | sebzim4500 wrote: | I like how 'fun' benchmarks like yours seem to be so much more | effective at judging model performance than all the traditional | NLP benchmarks. | | I still think the best way to compare too models is to simulate | a rap battle between them, then it's immediately obvious who | wins. | coolspot wrote: | Yep, rap battle is how I knew Bard is nowhere close to | ChatGPT. | | In the past whole world was watching Kasparov vs Deep Blue . | This time we will do Eminem vs LLM. | | What a time to be alive! | SeanAnderson wrote: | I'm curious if this will give better results than llama 7B? Llama | 7B felt like a toy that, while cool to be able to run locally, | did not feel useful in any way when contrasted to the state of | GPT. Here's hoping for better and/or release of larger parameter | models with low performance requirements soon :) | | EDIT: my first question times out when ran online, seems like | huggingface is getting hugged to death. | nwoli wrote: | They have on their team people from Anthropic which have stuff | like Claude Instant which is likely running a very light model, | ie the tricks from Anthropic have likely been incorporated into | the model they're presenting here | HarHarVeryFunny wrote: | If you're comparing to ChatGPT performance then Vicuna 13B | would be a best comparison point for something Llama-based. | bugglebeetle wrote: | Vicuna 13B performance is an order of magnitude below ChatGPT | for all but gimmicky conversational stuff. Try giving both | somewhat large, task-based prompts with steps and see what | happens. | HarHarVeryFunny wrote: | Maybe I should have phrased that better! I didn't mean that | Vicuna was comparable to ChatGPT, just that it's the best | Llama-based comparison you can make (since it's at least | been conversationally trained). | dragonwriter wrote: | > Vicuna 13B performance is an order of magnitude below | ChatGPT for all but gimmicky conversational stuff. | | Until you connect it to external resources, I tend to think | of anything you do with "brain-in-a-jar" isolated ChatGPT | as gimmicky conversational stuff. | goatlover wrote: | ChatGPT is still going to be way more capable when you | use it's API to connect to external resources. | levesque wrote: | Isn't ChatGPT a 165B parameter model? | sanxiyn wrote: | No. OpenAI haven't disclosed parameter count of GPT-3.5 or | GPT-4, which are models used by ChatGPT. You may be | thinking of GPT-3, which is indeed a 175B parameter model. | levesque wrote: | Ah, interesting. Thought GPT-3.5 had the same structure | as GPT-3, for some reason. GPT-4 would obviously be | different. | pksebben wrote: | Even if it doesn't initially, the fact that it's being released | so permissively is massive - stable diffusion was made far more | powerful by being hackable at all levels and I can't imagine we | won't see the same here. | | I imagine things like control nets that restrict output to | parsable types, LoRa style adaptations that allow mixable | "attitudes", that sort of thing. | | Very different underlying architecture from diffusers, ofc. But | the action of open source is the same - a million monkeys with | a million xterms and so forth. | edkennedy wrote: | I'm really hoping for the ability to load in different sets | of trained material as embeddings/textual inversions like in | Stable Diffusion. Imagine scanning in some of your favorite | philosophy and design books and throwing them with small | weighting as a little flavor to your answer. The crossovers | between LLM and Stable Diffusion type models (like Loras) is | such a fascinating space to explore. | cheald wrote: | This has been around for GPT models for a while in the form | of "soft prompts", which are rather approximate to textual | inversion in the Stable Diffusion space. | alex_sf wrote: | This already exists for LLaMA, I'm sure it'll be | implemented for StableLM as well. | | https://github.com/lxe/simple-llm-finetuner | anononaut wrote: | My prayers have been answered | IceWreck wrote: | StabilityAI is the real OpenAI. Thanks for this. | diminish wrote: | Well said. Openai is a promise unkept. Thanks StabilityAI for | existing. | nashashmi wrote: | when has opensource ever spearheaded independent innovation? | they usually follow along. | | Fred Wilson once did a take on all trends in SV. First some | firm comes out with a product that changes the landscape and | makes a massive profit. Then some little firm comes along and | does the same for a cheaper price. Then some ambitious group | out of college comes out with an open-source version of the | same. | | Open source has never been a trailblazer of innovation. Open | "research" was the original mantra for open ai. And an | entrepreneur in residence put together a great product. If | they were any more open, it would not make sense. | bugglebeetle wrote: | > Open source has never been a trailblazer of innovation. | | Except for, you know, all the major programming languages | and Linux, which make all that innovation possible in the | first place. Also, everything OpenAI is doing is based on | open source stuff from Google and others, so... | nashashmi wrote: | True. Should have added open research has given lots of | puzzle pieces for innovation. | | And open source products has led to many individual | contributions. | | But again it's never been a trailblazer for innovation. | sainez wrote: | This could not be more untrue. | | The world is littered with businesses that operate as | commercial wrappers around open source technology. Ever | heard of GitHub? What about MacOS? AWS? | capableweb wrote: | Unfortunately, due to the law of names, StabilityAI will in | the future hit the same issue as OpenAI and do a 180, | unleashing very unstable AI to the world. | mike_hearn wrote: | More like Stability will turn out to be an unstable | company. Last we heard they were struggling to raise more | funding and might lose their CEO due to unclear business | models: | | https://www.semafor.com/article/04/07/2023/stability-ai- | is-o... | paxys wrote: | The company can cease operations tomorrow, but the model | they open sourced (and all of its derivatives built by | the community) will continue to exist. If OpenAI | disappears then all of the work they have done goes with | it. | IceWreck wrote: | I'd rather have that than closed AI models controlled by | afew corporations | seydor wrote: | then they both have ironic names now | jacooper wrote: | Not fully. The instruction tuned model is CC-BY-NC-SA | alex_sf wrote: | That's a limitation of the dataset used for that particular | tuned model. Probably not a great choice on their part given | that people aren't reading past the headline, but the actual | base model is not restricted. | dragonwriter wrote: | Its CC-BY-NC-SA because of the upstream sources used for | instruction training. There's open resources being developed | for that that I've seen, but probably nothing ready. | Kelamir wrote: | What does this mean in context of https://futureoflife.org/open- | letter/pause-giant-ai-experime... , where CEO of Stability AI had | left a signature on the petition(it's 8th in the list)? Do they | want time to catch up? | | Regardless of that, I'm glad that StabilityAI enters the field as | well and releases models for public use. | sanxiyn wrote: | It doesn't mean anything. The letter says: | | > Therefore, we call on all AI labs to immediately pause for at | least 6 months the training of AI systems more powerful than | GPT-4. | | StableLM is not an AI system more powerful than GPT-4, so the | pause does not apply. | samstave wrote: | To whom, specificlally does the "pause apply" | | Because, I can tell you that no state-agent is going to | pause, but amplify. | | Israel, Iran, China, Russia and any self-respecting NATO | country is secretly pushing their AI as fast as fn possible. | | You think the US is pausing anything with a 1 trillion dollar | defense budget, especially when this AI has surfaced? | | The NSA has been projecting these capabilities forever.... | | Look at the movie "enemy of the state" as a documentary on | capabilities as early as 1998... now look at the fractal | spiral that we are witness (and victim) of. | moonchrome wrote: | All these state actor stories imply that state actors are | anywhere close to GPT4 levels. Do you have any evidence for | that ? | samstave wrote: | >> _Do you have any evidence for tha_ | | I do not, yet I am a _SUPER SKEPTIC_ -- > means I am a | conspiracy weirdo that doesnt believe a gosh darn thing | any government says, but I am also a technologist who is | not ignorant to things which have been built in secrecy. | | Thus ;; I summize that some crazy shit is going on with | AI behind the scenes that we are not privy to -- and if | one persons reality of _" you cannot believe that _they* | are doing anything with AI that we dont know about"* ... | to paraphrase a few "A nuke is literally about to fall on | our heads" | | -- | | We are moments away from realizing that it ALREADY | happened.... | goatlover wrote: | As far as we know, OpenAI has the most advanced LLM in the | world right now. Just because governments can fund all | sorts of expensive military contracts or spying networks | doesn't mean they have the cutting edge AI research. Lot of | times it's companies that make the breakthroughs in tech. | samstave wrote: | Just so I understand how to come to that conclusion ; how | did you come to that conclusion? | | Do we have _any_ transparent measure? | AppleBananaPie wrote: | I'd love to hear from someone knowledgeable in the area but my | gut instinct was the majority of people / organizations signing | on to that 'pause' just wanted to catch up. | sebzim4500 wrote: | Either that or the guy didn't actually sign it. It's not like | anyone actually checked who it was. | timestretch wrote: | He's mentioned signing it in interviews. He discusses it | here: https://youtu.be/SKoYhcC3HrM?t=1344 | huseyinkeles wrote: | This is amazing. They even let the developers use it for | commercial purposes; | | "Developers can freely inspect, use, and adapt our StableLM base | models for commercial or research purposes, subject to the terms | of the CC BY-SA-4.0 license." | | You can use this link to interact with the 7B model; | | https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp... | IceWreck wrote: | Is there a way to check the queue for this ? | | I sent it one small text (actually a task) five minutes ago. | Its still loading. | pksebben wrote: | same. Probably hugged to death. | convery wrote: | It just finished building. | IceWreck wrote: | Just tested it. I tried to get it to write a conclusion | after giving it my report and while is was correct, it | was kind of silly, a far cry from chatgpt. But again, | this is the 7B variant and its open source. | ilaksh wrote: | Amazing but the instruction fine-tuning is still a huge challenge | for businesses since what is released cannot be used for | commercial purposes. Instructions are much more useful. | | I have a feeling that there are probably some people who will | look at the "commercial okay" license for the first part and in | their mind that will somehow make it okay to use the instruction- | tuned ones for commercial purposes. | | Maybe we don't really need Instruct stuff? Because it seems like | its a huge amount of redoing work. I wonder if the OpenAssistant | people will start building off of these models. | dougmwne wrote: | The instruct tuning can be done with several open datasets at | minimal cost. Should be easy for someone to create their own | open model. | jacooper wrote: | How? | dougmwne wrote: | https://github.com/tatsu-lab/stanford_alpaca ___________________________________________________________________ (page generated 2023-04-19 23:00 UTC)