[HN Gopher] StableLM: A new open-source language model
       ___________________________________________________________________
        
       StableLM: A new open-source language model
        
       Author : davidbarker
       Score  : 980 points
       Date   : 2023-04-19 15:10 UTC (7 hours ago)
        
 (HTM) web link (stability.ai)
 (TXT) w3m dump (stability.ai)
        
       | [deleted]
        
       | Garcia98 wrote:
       | I really dislike this approach of announcing new models that some
       | companies have taken, they don't mention evaluation results or
       | performance of the model, but instead talk about how
       | "transparent", "accessible" and "supportive" these models are.
       | 
       | Anyway, I have benchmarked stablelm-base-alpha-3b (the open-
       | source version, not the fine-tuned one which is under a NC
       | license) using the MMLU benchmark and the results are rather
       | underwhelming compared to other open source models:
       | 
       | - stablelm-base-alpha-3b (3B params): 25.6% average accuracy
       | 
       | - flan-t5-xl (3B params): 49.3% average accuracy
       | 
       | - flan-t5-small (80M params): 29.4% average accuracy
       | 
       | MMLU is just one benchmark, but based on the blog post, I don't
       | think it will yield much better results in others. I'll leave
       | links to the MMLU results of other proprietary[0] and open-
       | access[1] models (results may vary by +-2% depending on the
       | parameters used during inference).
       | 
       | [0]: https://paperswithcode.com/sota/multi-task-language-
       | understa...
       | 
       | [1]: https://github.com/declare-lab/flan-
       | eval/blob/main/mmlu.py#L...
        
         | mnkv wrote:
         | How did you run the benchmarking, zero-shot or few-shot? I
         | think a fair comparison would be Llama-7B which got an average
         | ~35% for 5-shot.
        
           | Garcia98 wrote:
           | 5-shot prompting.
        
         | jvm wrote:
         | Doesn't make much sense to compare a model that's not fine
         | tuned to flan models that are fine tuned. Makes more sense to
         | compare to something like T5 base where it's probably a lot
         | more comparable.
        
           | Garcia98 wrote:
           | I would have compared it to the fine-tuned version if it had
           | been released under a truly open-source license. I think
           | developers implementing LLMs care more about licensing than
           | about the underlying details of the model.
           | 
           | Also t5-base is 220M params vs 3B params of stablelm, not
           | really a fair comparison anyways.
        
             | Taek wrote:
             | They used training data with restricted licenses to produce
             | the fine-tuned models, it's not a decision that stability
             | was able to make.
        
               | Garcia98 wrote:
               | I disagree, they made the decision to use datasets with
               | restrictive licensing, jumping the
               | alpaca/gpt4all/sharegpt bandwagon.
               | 
               | They also chose to toot their horn about how open-source
               | their models are, even though for practical uses half of
               | their released models are not more open source than a
               | leaked copy of LLaMa.
        
               | peterstjohn wrote:
               | So just use their base model and fine-tune with a non-
               | restrictive dataset (e.g. Databricks' Dolly 2.0
               | instructions)? You can get a decent LoRA fine-tune done
               | in a day or so on consumer GPU hardware, I would imagine.
               | 
               | The point here is that you can use their bases in place
               | of LLaMA and not have to jump through the hoops, so the
               | fine-tuned models are really just there for a bit of
               | flash...
        
       | sunshadow wrote:
       | Demo is overloaded. Here is Colab link if you wanna check the
       | quality: https://colab.research.google.com/github/Stability-
       | AI/Stable...
        
         | [deleted]
        
         | drexlspivey wrote:
         | Your session crashed after using all available RAM. If you are
         | interested in access to high-RAM runtimes, you may want to
         | check out Colab PRO
        
           | sunshadow wrote:
           | You can try 8bit
        
             | xkapastel wrote:
             | Using 8-bit still runs out of RAM for both the 3B and 7B
             | models. It's unclear if it's because it still uses more
             | than the available RAM, or if it's just quietly not using
             | 8-bit since it's not implemented.
        
         | BudaDude wrote:
         | This notebook will need high ram and a premium GPU. It blew
         | past the 16gb GPU that comes with Colab Pro
        
       | andy_xor_andrew wrote:
       | > The context length for these models is 4096 tokens.
       | 
       | !!! And I was excited that llama gave us 2048!!
        
       | txtai wrote:
       | Great to see the continued release of open models. The only
       | disappointing thing is that models keep building on CC-BY-NC
       | licensed datasets, which severely limits their use.
       | 
       | Hopefully, people consider txtinstruct
       | (https://github.com/neuml/txtinstruct) and other approaches to
       | generate instruction-tuning datasets without the baggage.
        
       | vagab0nd wrote:
       | I look forward to a future where almost all software would allow
       | me to plug in an LLM API:
       | 
       | - browser: ad removal/skipping
       | 
       | - RSS: information aggregation
       | 
       | - recommendation systems
       | 
       | - games: customized NPC scripts; AI opponents
       | 
       | - home automation: personal butler
       | 
       | Hopefully, there would be more than one base-layer LLM providers
       | to choose from.
        
       | rvz wrote:
       | Finally, the true race to zero is now on.
       | 
       | Stability and others are already at the finish line in the race
       | to zero. OpenAI.com cannot get any cheaper and eventually will
       | have to increase their prices.
       | 
       | There is no escape and as I said before, open source AI models
       | will eventually swamp and survive longer and intergrate faster
       | than even OpenAI.com's ChatGPT.
       | 
       | Expect open source models to get smaller and even better such
       | that it will fit in an iPhone, on device.
       | 
       | Stay tuned.
        
         | [deleted]
        
         | anticensor wrote:
         | Why are you referring to OpenAI as ai.com?
        
           | netsroht wrote:
           | I think he's just emphasizing that OpenAI is in fact not
           | open, thusly it's crossed out.
        
         | [deleted]
        
       | jhoelzel wrote:
       | Has anybody figured out the hardware requirements yet?
       | 
       | This could be the beginning of something beautifull and i am
       | always impressed by the work stable diffusion does!
       | 
       | They are the true openAI (without empty promises in the middle
       | ;))
        
       | lanstin wrote:
       | GPT4 supposedly cost USD 100M to train and seems much smarter
       | than anything else released. is there any chance to get this much
       | money for open source training? rea lly big EU grants or some
       | philanthropist or?
        
       | dougmwne wrote:
       | This is absolutely huge. LLaMA weights leaking was a big deal,
       | but you couldn't actually use them without attracting Meta's ire.
       | Would love to see some benchmarking vs. LLaMA and GPT.
        
         | pyinstallwoes wrote:
         | What is IRE?
        
           | vineyardmike wrote:
           | It's a noun meaning anger or wrath. Usually it's specific not
           | just to triggering anger _the emotion_ , but an angry
           | _response_. So attracting Meta's ire means attracting an
           | angry response from meta - like the legal response users of
           | LLaMA have triggered.
           | 
           | It's not an acronym.
        
           | dougmwne wrote:
           | The real thing we should be asking is if the 3B or 7B model
           | is capable of correctly answering this question.
        
           | jwineinger wrote:
           | anger. probably a lawsuit in this context
        
             | jmiskovic wrote:
             | There were no lawsuits around LLaMA. They used DCMA
             | requests to take down some GitHub and HuggingFace releases
             | but the majority of releases (Alpacas and other animals)
             | was left alone. It was enough to prevent commercial use,
             | though. Edit: a lawsuit would be quite interesting, to
             | clear up many things around how copyright works for LLMs.
        
           | inportb wrote:
           | Ire is the result of irritation.
        
             | ElFitz wrote:
             | Are you saying we shouldn't scratch Meta's itch?
        
             | wsgeorge wrote:
             | Oh this is good. :)
        
           | sangnoir wrote:
           | It's a department at meta: Intellectual Rights Enforcement.
           | /s
           | 
           | "Ire" is a synonym for "anger" or "wrath"
        
             | pyinstallwoes wrote:
             | That's actually what I thought it might be. Thanks lol.
        
           | antibasilisk wrote:
           | Internal recurrent expenditure
        
         | jacooper wrote:
         | Do we even know if model weights are copyrightable or not?
        
           | sebzim4500 wrote:
           | No but unless you enjoy being sued by Meta you have to act as
           | if they are.
        
           | janmo wrote:
           | I was asking myself the same question and am 99% sure it
           | isn't protected by IP laws. It is another story for the
           | training data and the source code used to run the model.
        
           | Invictus0 wrote:
           | A JPG is copyrightable and that's just an array of numbers.
           | Model weights seem pretty similar as a class.
        
             | dougmwne wrote:
             | It's not the JPG data you can copyright, but the
             | recognisable image it produces. Every time you re-save or
             | resize the image, the data changes, but the recognizable
             | image remains.
             | 
             | It's not clear how this process applies to model weights.
             | Once you run another training epoch on them, the data has
             | changed. What is the essential copyrightable, trademarkable
             | or patentable thing that remains? A legally untested
             | question for sure.
        
             | AustinDev wrote:
             | Couldn't you just transform the weights in some predictable
             | way and then untransform them before use in runtime? The
             | weights at rest would be completely distinct from the meta
             | weights.
        
         | brucethemoose2 wrote:
         | All sorts of llama weights are on huggingface now.
        
       | prions wrote:
       | > Supportive. We build models to support our users, not replace
       | them. We are focused on efficient, specialized, and practical AI
       | performance - not a quest for god-like intelligence. We develop
       | tools that help everyday people and everyday firms use AI to
       | unlock creativity, boost their productivity, and open up new
       | economic opportunities.
       | 
       | Refreshing take on the peak alarmism we see from tech "thought
       | leaders"
        
         | gumballindie wrote:
         | Well exactly. AI _is_ a tool and a very good one at that.
        
         | CamperBob2 wrote:
         | I dunno... god-like intelligence would be pretty useful. I'll
         | take a brochure.
        
           | nashashmi wrote:
           | do you trust god?
        
             | CamperBob2 wrote:
             | Well, ...
             | 
             | OK, I withdraw the comment.
        
             | sdenton4 wrote:
             | Why should I need to? Isn't God on the blockchain? (j/k)
        
         | garbagecoder wrote:
         | Yeah all the Terminator energy around these AI things is so
         | off-putting. They aren't like that. They're big matrices and
         | they are very cool tools!
        
           | bbor wrote:
           | What if big matrices are the last missing piece to research
           | going on since the 50s...
        
           | circuit10 wrote:
           | But the concerns about AI taking over the world are valid and
           | important; even if they sound silly at first, there is some
           | very solid reasoning behind it. They're big matrices, yes,
           | but they're Turing-complete which means they can
           | theoretically do any computational task
           | 
           | See https://youtu.be/tcdVC4e6EV4 for a really interesting
           | video on why a theoretical superintelligent AI would be
           | dangerous, and when you factor in that these models could
           | self-improve and approach that level of intelligence it gets
           | worrying...
        
         | varunjain99 wrote:
         | Well, it's to their benefit to portray their models as working
         | alongside and enhancing humans, as opposed to replacing us. So
         | it sounds a bit like marketing speak to me.
         | 
         | And it's to the benefit of many of those tech "thought leaders"
         | to be alarmist since they don't have much of the AI pie
        
         | seydor wrote:
         | Doesn't sell as much, though
        
         | reubenmorais wrote:
         | This is just marketing. They're positioning themselves as
         | somehow "more human" while building the exact same technology.
         | When a model supports me by doing the work I'd otherwise hire
         | someone to do, the model just replaced someone. And this goes
         | without saying, but a large amount of outsourced tasks today
         | don't exactly require "god-like intelligence".
        
           | rmbyrro wrote:
           | That was probably said about the automobile, when it replaced
           | horses, or about electrical lamps, when replaced oil-based
           | lamps, no?
           | 
           | I mean, every city had an army of people to light up and down
           | oil lamps in the streets, and these jobs went away. But
           | people were freed up to do better stuff.
        
             | leovailati wrote:
             | It is different this time. I bet that was also said when
             | the transformations that you mentioned occurred, but _this
             | time_ it really is different.
             | 
             | LLM models are pretty general in their capabilities, so it
             | is not like the relatively slow process of electrification,
             | when lamplighters lost their jobs. _Everyone_ can lose
             | their jobs in a matter of months because AI can do close to
             | everything.
             | 
             | I am excited to live in a world where AI has "freed" humans
             | from wage slavery, but our economic system is not ready to
             | deal with that yet.
        
             | [deleted]
        
         | antibasilisk wrote:
         | >Refreshing take on the peak alarmism we see from tech "thought
         | leaders"
         | 
         | It's not alarmism when people have openly stated their intent
         | to do those things.
        
           | dragonwriter wrote:
           | Its alarmism to support government regulation to reinforce
           | the moat when industry leaders say they intend to do it, but
           | also that the danger of it being done is why competition with
           | them must be restricted by the State (and why they can't,
           | despite being, or being a subsidiary of, a nonprofit founded
           | on an openness mission, share any substantive information on
           | their current models.)
        
       | ericpauley wrote:
       | I think there's a misunderstanding here (and on Alpaca) about
       | exactly what _non-commercial_ means with respect to software
       | licenses. Here 's some related discussion:
       | https://opensource.stackexchange.com/a/12074
       | 
       | The CC-NC licenses cover modification and redistribution ("use"
       | in the license). They apparently have no bearing on whether you
       | can personally _run_ the software. And of course the outputs of
       | the software are not copyrightable.
        
       | GaggiX wrote:
       | the models: https://huggingface.co/stabilityai/stablelm-base-
       | alpha-3b, https://huggingface.co/stabilityai/stablelm-base-
       | alpha-7b
       | 
       | There are also tuned version of these models:
       | https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b
       | https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b, these
       | versions are fine-tuned on various chat and instruction-following
       | datasets.
       | 
       | The Github repo mentions that the models will be trained on 1.5T
       | tokens, this is pretty huge in my opinion, the alpha models are
       | trained on 800B tokens. The context lenght is 4096.
        
         | bhouston wrote:
         | These models are huge. I assume they are not quantized down to
         | 4bits yet.
        
           | brucethemoose2 wrote:
           | Quantized versions will pop up on huggingface very soon, if
           | they arent already there. It takes basically no time, much
           | less than something like a alpaca finetune.
        
       | nwoli wrote:
       | I love stability ai so much
        
       | enono wrote:
       | [flagged]
        
       | paxys wrote:
       | OpenAI really needs to donate its name to Stability.
        
       | Filligree wrote:
       | The 7B model just barely fits in a 3090, but I've yet to get
       | useful output; it usually gets stuck in a repetition loop.
       | 
       | Spent some time playing with sampling parameters, to no effect.
       | Has anyone else had a go?
        
         | cheald wrote:
         | If you're getting stuck in a sampling loop, try raising the
         | temperature (0.6+) and/or increasing the repetition penalty
         | (1.1-1.2 often work well).
        
         | nlightcho wrote:
         | Make sure you are using the instruction tuned model. The base
         | model will be difficult to prompt.
         | 
         | It works in 8-bit with about 12GB of VRAM usage. Here's sample
         | code:
         | 
         | https://gist.github.com/AlexanderDzhoganov/a1d1ebdb018e2e573...
        
       | dang wrote:
       | https://github.com/Stability-AI/StableLM
        
       | [deleted]
        
       | quickthrower2 wrote:
       | I found it a bit difficult to get answers from:
       | 
       | Q. Can you recommend a good vegan restaurant in Sydney, near the
       | water
       | 
       | A. I'm not a vegan, so I don't have a favorite restaurant. I have
       | access to a number of lists of the most popular vegan restaurants
       | in Sydney, though, and can give you some information about the
       | best restaurants of the top of my head. Please let me know if
       | there's anything else I can do to assist.
       | 
       | Q. Ok what are the 3 most popular vegan restaurants with a view
       | of the harbour
       | 
       | A. The most popular vegan restaurants near the water are:
       | 
       | Q. Are what?
       | 
       | A. The most popular vegan restaurants near the water are
       | 
       | Q. Name a vegan restaurant
       | 
       | A.
       | 
       | 1. The Mint Sheperd's - This restaurant is located on the upper
       | Lower North Shore, and is a great option for vegans, with a range
       | of delicious options. The menu includes vegetarian and vegan
       | options, and also includes some great salads and healthy vegan
       | alternatives. 2. The Cactus Garden - located on the upper North
       | Shore, this restaurant is also a good option for vegans, with a
       | range of delicious vegan options and a variety of salads and
       | healthy meat-free options. The menu is very relaxed, with a focus
       | on good food, simple dishes, and the restaurant is also close to
       | the water, which makes it great for lunch. 3. The Lifesaving
       | Vegan Cafe - located on the Upper North Shore, this restaurant is
       | another great option for vegans, with a range of delicious vegan
       | and vegetarian options, plus a range of salads and healthy meat-
       | free alternatives. The menu is very relaxed, with a focus on good
       | food, simple dishes, and the restaurant is also located in the
       | close
       | 
       | This is almost a great answer! Just that these restaurants don't
       | exist.
       | 
       | I had 100x better experience with
       | https://news.ycombinator.com/item?id=35583349, running locally!
        
       | mousepad12 wrote:
       | [dead]
        
       | la64710 wrote:
       | Thanks for breaking ground here and being truly OSS
        
       | convery wrote:
       | Just tested the 7B-tuned version on Huggingface and it's not
       | great..                   > What features were added in C++ 98?
       | < C++98 (also known as C++11) was a major development milestone
       | for C++         < new gcc 5.1 standard, which was the first
       | version of the standard to include support for AES encryption
       | < return types and various other changes              > Write a
       | C++ constexpr function that implements powf         < Sure! Here
       | is an implementation of a C++11 constexpr function for the
       | function pow, using C++11 recursion and variadic templates:
       | < void pow (int n) { std::cout << "Powerset of " << n << " is "
       | << pow (n, n) << std::endl; }         < This implementation of
       | the pow function should be possible for any constant integer
       | power of 2 without exception.
        
         | brucethemoose2 wrote:
         | Maybe you are not following the right syntax? LLaMA Vicuna and
         | Alpaca suffer if you dont format the query jusy right.
        
         | [deleted]
        
       | josephwegner wrote:
       | Anyone have a sense for system requirements on running this
       | locally? StableDiffusion was too heavy for either of my machines
       | (which are admittedly not very powerful) - I'm not seeing much in
       | the docs to indicate whether or not this is more or less
       | intensive to run locally.
        
         | ranguna wrote:
         | The tuned 7B model is around 33 GBs, so you'll need a PC with
         | that much VRAM or RAM. I haven't tried to load it on text
         | generation ui though.
        
         | wsgeorge wrote:
         | If you can run any models on llama.cpp, that might be a good
         | indicator of which StableLM models you'll be able to run.
         | 
         | I easily ran 7B int 4 ggml models on an MBP with 16gig RAM.
         | Same works on a MBA with 8 gig RAM, but you'll have to not run
         | any other memory-hogging app.
        
       | tucnak wrote:
       | On the off-note, can anybody tell me what's going on with
       | embeddings, & vector databases? Certainly it would seem that
       | forward-pass completion is pretty much solved, & a smaller,
       | better model will appear eventually. Let's say you even managed
       | to solve both complete() and embed() but what do you do with it,
       | how are you going to organise, query, and multiply this dataset?
       | Now the question I know that text-embedding-ada-002 has twice as
       | many dimensions as mainstream Sentence transformers. Do we need
       | all the extra dimensions? If not, how do I make it work better
       | for my specific dataset with lots of jargon and abbreviations and
       | stuff like that? What are the hardware requirements for that?
       | I.e. could I do a fine-tuning job on some specific jargon-heavy
       | text to get better embeddings for them? For one, the more I look
       | into similarity-based use-cases the more I see that it's not
       | normally speaking "top-percentile nearest-neightbour search" but
       | the data is also terribly relational, i.e. it's probably like a
       | slowly changing dimension, and there's a tree traversal type
       | structure in how documents are generated as output from other
       | documents as inputs? So you kind of have to think about these
       | complete/embed ops both in aggregate; for batching but also in
       | particular, from the cost/reward ROI type calculation. Not just
       | in aggregate but also in terms of memory usage patterns to
       | further optimise layout-- tiering and stuff like that really
       | comes to light.
       | 
       | Also: vector database shilling on HN is getting out of hand;
       | multiple companies literally plugging every mention on the radar,
       | some actively begging for upvotes. Looking at it all makes you
       | really appreciate pgvector[1] to a point where you would be more
       | willing to buy 3.2 TB of high-bandwidth NVMe and dedicate it to a
       | large IFV index than ever have to deal with all of this "purpose-
       | built vector database" bullshit.
       | 
       | [1]: https://github.com/pgvector/pgvector
        
         | visarga wrote:
         | > Do we need all the extra dimensions?
         | 
         | This discussion seems relevant:
         | https://www.reddit.com/r/MachineLearning/comments/12q8rp1/di...
        
         | Der_Einzige wrote:
         | Yes, you need all of the dimensions. All of the dimensionality
         | reduction techniques, including SOTA ones (UMAP or better) are
         | going to massively harm your embeddings.
        
           | tucnak wrote:
           | Perhaps I didn't word by question correctly, I'm looking to
           | compare capability of Sentence transformers vs. OpenAI Ada-
           | based embeddings relative to their respective dimensionality?
        
       | lxe wrote:
       | Finetuning and training models of this size from scratch is now
       | relatively inexpensive and accessible to anyone, especially with
       | all these datasets becoming freely available. This is just the
       | beginning of smaller LLMs that anyone can finetune to achieve
       | high quality performance on particular tasks.
        
       | antimatter15 wrote:
       | Looks like Nolano.org's "cformers" includes a fork of
       | llama.cpp/ggml by HCBlackFox that supports the GPT-NeoX
       | architecture that powers EleutherAI's Pythia family of open LLMs
       | (which also powers Databrick's new Dolly 2.0), as well as
       | StabilityAI's new StableLM.
       | 
       | I quantized the weights to 4-bit and uploaded it to HuggingFace:
       | https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alp...
       | 
       | Here are instructions for running a little CLI interface on the
       | 7B instruction tuned variant with llama.cpp-style quantized CPU
       | inference.                   pip install transformers wget
       | git clone https://github.com/antimatter15/cformers.git         cd
       | cformers/cformers/cpp && make && cd ..         python chat.py -m
       | stability
       | 
       | That said, I'm getting pretty poor performance out of the
       | instruction tuned variant of this model. Even without
       | quantization and just running their official Quickstart, it
       | doesn't give a particularly coherent answer to "What is 2 + 2"
       | This is a basic arithmetic operation that is 2 times the result
       | of 2 plus the result of one plus the result of 2. In other words,
       | 2 + 2 is equal to 2 + (2 x 2) + 1 + (2 x 1).
        
       | jmatthews wrote:
       | The AI war has officially begun.
        
         | mach1ne wrote:
         | No, that comes when China catches up in the quest for
         | superintelligence.
        
           | xyzzy4747 wrote:
           | I don't think China has enough skilled AI workers to compete
           | in any quick timespan. Top talent doesn't want to live and
           | work there generally.
        
             | sanxiyn wrote:
             | China is demonstrably capable of training GLM-130B. Skills
             | involved seem comparable to Stability AI.
             | 
             | https://keg.cs.tsinghua.edu.cn/glm-130b/
        
             | [deleted]
        
             | kriro wrote:
             | I'm curious why you'd think that. China as a country has
             | many people to start. Some percentage of these people will
             | end up in AI. Assuming people from all countries are
             | roughly equally intelligent, the numbers clearly favor
             | China. Universities over there are quite good, there's a
             | pretty strong "work hard" mentality I see from all our
             | Chinese students. Plenty of Chinese graduating or starting
             | university these days during the AI hype peak. China as a
             | country isn't sleeping on AI either. I think China as an AI
             | hub looks quite promising. Anecdotally, China also retains
             | quite a lot of talent or people go abroad to study and
             | return to China. Compared to some European countries or
             | India that "leak" a lot of talent to the U.S. I think China
             | is quite a bit more stable.
             | 
             | On the hardware side, things tend to be produced there as
             | well.
        
               | snordgren wrote:
               | China definitely "leaks" a lot of talent to American
               | companies - most AI papers that I've seen from respected
               | Western universities include at least one Chinese name.
               | 
               | One challenge for China has been the university
               | enrollment rate. While in Western countries half of each
               | cohort has been going to university for decades, China is
               | not there yet. In 2019, just 17% of Chinese adults have
               | degrees compared to 44% in the US.
               | 
               | So the large Chinese population is offset by its relative
               | lack of access to education, while the US can draw from
               | its own highly educated population in addition to
               | attracting the best and the brightest from the rest of
               | the world, including China.
        
           | HybridCurve wrote:
           | There are plenty of authoritarians in the US to conduct
           | warfare against our institutions that are up to speed. We
           | don't need to wait for anyone else overseas to get this party
           | started.
        
           | ChatGTP wrote:
           | Stop worrying about China...wow
        
         | rafark wrote:
         | I'm more than ready. It's exciting to see what can come out of
         | this.
        
       | andy_xor_andrew wrote:
       | One question I have is- can we expect GPTQ quantization down to 4
       | bits (or lower) for these models, the same as for llama (which is
       | partly why llama was so amazing?)
       | 
       | I'm sure the answer is "yes", but I wonder if there's anything
       | that can be done that could make these models more "friendly" to
       | quantization like this from the very beginning.
       | 
       | It's absolutely incredible that you can cut the size of a model
       | in half... and then cut the size of the model in half *again*...
       | and it fits on your consumer GPU with only a small cut in
       | accuracy.
        
       | brianjking wrote:
       | What is the actual LICENSE?
       | 
       | * HuggingFace shows CC-by-NC
       | https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b
       | 
       | * Github is Apache 2.0
       | 
       | https://github.com/Stability-AI/StableLM/blob/main/LICENSE
        
         | Taek wrote:
         | NC is just for the fine tuned model
        
       | GistNoesis wrote:
       | I'm worried about the version hell of relying on HuggingFace's
       | transformers.
       | 
       | In the model card : config.json [1] and generation_config.json
       | there are strings "transformers_version": "4.28.1" and it refers
       | to some common architecture "GPTNeoXForCausalLM" .
       | 
       | Which if I follow the string soft link correctly means, it refers
       | to this file [2] with a long history of commits including some
       | recents [3].
       | 
       | I don't understand how to get out of the version hell. I can
       | install and pin a version of the transformer library, but sooner
       | or later if I chain different foundation models I'll have to do
       | this for each different model card that has its own version.
       | 
       | And then those transformer library version are probably dependent
       | on some pytorch versions.
       | 
       | I don't understand how you can't reliably track a bug, or a
       | change in behavior, or guarantee that the model that you spend
       | millions of dollar training doesn't rust, is reproducible or
       | become unusable due to this mess.
       | 
       | (And also the weights are de-serialized pickled python dictionary
       | which is from a safety point of view equivalent to running
       | untrusted binaries and very worrying on its own.)
       | 
       | [1]https://huggingface.co/stabilityai/stablelm-tuned-
       | alpha-7b/b... [2]https://github.com/huggingface/transformers/blob
       | /v4.28.1/src... [3]https://github.com/huggingface/transformers/co
       | mmits/v4.28.1/...
        
         | lhl wrote:
         | I've been diving in lately and while it's not efficient, the
         | only way to do manage is to create a new conda/mamba
         | environment, or a custom Docker image for all the conflicting
         | packages.
         | 
         | For safety and speed, you should prefer the safetensor format:
         | https://huggingface.co/docs/safetensors/speed
         | 
         | If you know what you are doing you can do your own conversions:
         | https://github.com/huggingface/safetensors or for safety,
         | https://huggingface.co/spaces/diffusers/convert
        
         | brucethemoose2 wrote:
         | > And then those transformer library version are probably
         | dependent on some pytorch versions.
         | 
         | They are not, and I dont think the model even cares about the
         | transformers version. I run git transformers/diffusers and
         | PyTorch 2.1 in all sorts of old repos, and if it doesnt
         | immediately work, usually theres just small changes to APIs
         | here and there that make scripts unhappy, and that you can
         | manually fix.
        
         | jxy wrote:
         | In terms of pickled files, we may ask them to release the
         | models in safetensors. That would be a big improvement.
         | 
         | There's not much we can do about dependencies on pytorch or
         | other python libraries. Perhaps people can make more
         | independent implementations. The redundancy in implementations
         | would help.
        
           | substation13 wrote:
           | I'm surprised that some of the cutting edge AI work is not
           | being done in pure Rust. Why even have a binding layer?
        
             | ElFitz wrote:
             | Python seems to have won over the Data / ML world.
             | 
             | So I'd wager they use what they and their intended audience
             | know.
        
               | tough wrote:
               | Someone implemented llama on Go recently,
               | https://github.com/gotzmann/llama.go
               | 
               | Wouldn't discard a rust implementation of some LLM
               | architecture at some point
        
               | Philpax wrote:
               | Hi! I'm a maintainer of
               | https://github.com/rustformers/llama-rs. We're planning
               | to expand our model support soon.
        
           | sdenton4 wrote:
           | The pickles drive me crazy.
           | 
           | Tensorflow saved models are a great way to solve the
           | problem... Save the computation graph and weights, and drop
           | all the crusty code dependencies. I think ONNX models are
           | similar. I expect there should be a Jax equivalent at some
           | point, as Jax is basically perfectly designed for this
           | (everything is expressed in lax operations, which allows
           | changing implementations for cpu/gpu/tpu freely... So just
           | save the list of lax ops).
        
       | seydor wrote:
       | When will we see gglm-versions of these?
       | 
       | Good job on openAI to sell out in 2022. It was truly the end of
       | the line.
       | 
       | No matter how bad these model releases are , they are certain to
       | get awesome soon with everybody hacking around them. The
       | surprising success of MiniGpt4 with images shows that openAI's
       | GPTs don't have some magic secret sauce that we dont know of.
        
         | sebzim4500 wrote:
         | GPT-4 is still much, much better than any of the competition.
         | 
         | I guess we'll see once we have a 175B version of StableLM
         | though, presumably that will at least easily beat GPT-3.
        
       | franzypants wrote:
       | So, it can write very coherently but not sure it excels at coding
       | tasks.
       | 
       | For example:
       | 
       | User: Write Python code for evaluating the integral of f(x)=x^2.
       | 
       | Assistant: Sure, here's an example of Python code to evaluate the
       | integral of a function `f(x) = x^2`: ```
       | 
       | # Define the function
       | 
       | def integral_of_f(x):                   return x**2
       | 
       | # Evaluate the integral
       | 
       | result = integral_of_f(3)
       | 
       | print("The integral of f(x) = x^2 is:", result) ```
       | 
       | In this code, we define the function `integral_of_f` as a
       | function that takes a single argument `x` and returns the
       | integral of `x^2`. We then evaluate the integral by calling the
       | `integral_of_f` function and passing in the value of `x` as an
       | argument.
        
       | vikp wrote:
       | It's fantastic that more orgs are releasing open-source models
       | trained on more than 300B or so tokens. Here's my take from the
       | details I could find.
       | 
       | Pros                 - 4096 context width (vs 2048 for llama,
       | gpt-j, etc)       - 3B to 65B released or in progress       - RL
       | tuned models available       - Trained on more tokens than
       | existing non-llama models       - 128 head dim, so can use flash
       | attention (unlike GPT-J)
       | 
       | Cons                 - No benchmarks released, or details about
       | the model       - Somewhat restrictive license on the base
       | models, and NC license on the RL models       - Small models only
       | trained on 800B tokens, compared to 1T for llama-7B, and
       | potentially more for other upcoming alternatives (RedPajama,
       | etc).  I'd like to see their loss curves to see why they chose
       | 800B.
       | 
       | High-level, this is likely to be more accurate than existing non-
       | llama open source models. It's hard to say without benchmarks
       | (but benchmarks have been gamed by training on benchmark data, so
       | really it's just hard to say).
       | 
       | Some upcoming models in the next few weeks may be more accurate
       | than this, and have less restrictive licenses. But this is a
       | really good option nonetheless.
        
         | HarHarVeryFunny wrote:
         | They mention 1.5T training tokens, perhaps for the largest
         | model only ?
        
           | vikp wrote:
           | It's unclear which models will be trained to 1.5T tokens. The
           | details of how many tokens each model saw in training are on
           | Github - https://github.com/stability-AI/stableLM/ . But only
           | for the ones that have been released.
        
             | thunderbird120 wrote:
             | I just asked a stability employee and they said the the
             | current models ran into an overfitting issue probably due
             | to some duplicated data somewhere in their dataset, which
             | consists of 1.5T tokens. The 800B tokens is the number of
             | tokens they've been trained on so far. The plan is to keep
             | going and train on the rest of the data once the issue is
             | resolved.
        
               | HarHarVeryFunny wrote:
               | I've asked this question in a few places, and never been
               | able to get an answer, maybe you know...
               | 
               | Q: Why are these LLMs trained on a single epoch, and
               | perform worse if the dataset is repeated ?
               | 
               | This seems maybe related to suspecting data duplication
               | as a cause of overfitting.
               | 
               | Why don't LLMs need multi-epoch training at a low
               | learning rate to generalize? If they are managing to
               | learn from a single epoch, that sounds more like they may
               | be memorizing!
        
         | sebzim4500 wrote:
         | >- No [...] details about the model
         | 
         | You can see the model architecture here
         | 
         | https://github.com/Stability-AI/StableLM/blob/main/configs/s...
        
         | lhl wrote:
         | FYI, I'm running lm-eval now w/ the tests Bellard uses
         | (lambada_standard, hellaswag, winogrande, piqa,coqa) on the
         | biggest 7B an 40GB A100 atm (non-quantized version, requires
         | 31.4GB) so will be directly comparable to what various LLaMAs
         | look like: https://bellard.org/ts_server/
         | 
         | (UPDATE: run took 1:36 to complete run, but failed at the end
         | with a TypeError, so will need to poke and rerun).
         | 
         | I'll place results in my spreadsheet (which also has my text-
         | davinci-003 results):
         | https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...
        
           | guywithabowtie wrote:
           | Do you also have results of GPT4 somewhere? or text-
           | davinci-003-turbo
        
             | lhl wrote:
             | I'm still on the waitlist for GPT-4 API access. Note, that
             | text-davinci-003 cost about $90 to benchmark at $0.02/1K
             | tokens, so if you're able to use a GPT-4 model (for
             | completion and not just instruction) that'll probably be
             | $270-$540 in credits to benchmark...
        
           | lunixbochs wrote:
           | Are you using https://github.com/EleutherAI/lm-evaluation-
           | harness?
        
             | lhl wrote:
             | Yeah, although looks like it currently has some issues with
             | coqa: https://github.com/EleutherAI/lm-evaluation-
             | harness/issues/2...
             | 
             | There's also the bigscience fork, but I ran into even more
             | problems (although I didn't try too hard)
             | https://github.com/bigscience-workshop/lm-evaluation-
             | harness
             | 
             | And there's https://github.com/EleutherAI/lm-eval2/ (not
             | sure if it's just starting over w/ a new repo or what?) but
             | it has limited tests available
        
           | sebzim4500 wrote:
           | How possible is it that every other model suffers from
           | dataset contamination and this model is being unfairly
           | penalized for having properly sanitized training data?
        
         | swyx wrote:
         | > 128 head dim, so can use flash attention (unlike GPT-J)
         | 
         | mind explaining why this is so attractive/what the hurdle is
         | for the laypeople in the audience? (me)
        
           | GaggiX wrote:
           | Standard attention has memory quadratic in sequence length,
           | whereas FlashAttention has memory linear in sequence length.
           | Also FalshAttention is faster.
        
             | WithinReason wrote:
             | According to the paper Flash Attention also needs quadratic
             | memory:
             | 
             | Let N be the sequence length, d be the head dimension, and
             | M be size of SRAM with d <= M <= Nd. Standard attention
             | (Algorithm 0) requires Th(Nd+N2) HBM accesses, while
             | FlashAttention (Algorithm 1) requires Th(N2d2M-1) HBM
             | accesses.
        
               | GaggiX wrote:
               | https://github.com/HazyResearch/flash-attention#memory
               | 
               | "standard attention has memory quadratic in sequence
               | length, whereas FlashAttention has memory linear in
               | sequence length."
               | 
               | I guess you have just reported how many times the layer
               | will need to access the memory, not how much memory usage
               | scales with sequence length.
        
             | sroussey wrote:
             | So there must be a downside to FlashAttention. What is it?
        
               | lhl wrote:
               | https://arxiv.org/abs/2205.14135 - Section 5 suggests
               | that the biggest limitation is that custom CUDA kernels
               | need to be coded on a per-GPU architecture basis.
        
               | kiraaa wrote:
               | installing it is a nightmare
        
               | fpgaminer wrote:
               | FlashAttention is mathematically identical to standard
               | attention, so in theory there's no downside. In practice,
               | numerical inaccuracies of floating point mean that the
               | results differ slightly. I don't know of any papers going
               | in depth to analyze what impact those variances have in a
               | range of real models, but generally speaking deep models
               | handle slightly variances well. I've not noticed any
               | difference in my applications training models. And tons
               | of people use FlashAttention as a drop-in replacement on
               | models trained on standard attention (e.g. using xformers
               | in StableDiffusion).
               | 
               | Also in practice FlashAttention is still relatively new
               | so it isn't well supported in libraries yet. Until
               | PyTorch 2.0 you had to either implement it yourself, or
               | use something like xformers which comes with a bag of
               | caveats. PyTorch 2.0 now has it built-in, and it's easy
               | to use, but the implementation is incomplete so you
               | can't, for example, use it with an attention mask (which
               | is needed in LLMs, for example).
               | 
               | tl;dr: Basically none, but it just isn't well supported
               | yet.
        
         | whimsicalism wrote:
         | > Small models only trained on 800B tokens, compared to 1T for
         | llama-7B
         | 
         | LLaMA is trained far beyond chinchilla optimality, so this is
         | not as surprising to me.
        
           | anentropic wrote:
           | According to this LLaMA still didn't go far enough:
           | https://www.harmdevries.com/post/model-size-vs-compute-
           | overh...
        
             | whimsicalism wrote:
             | Yep, it depends on what your goal is.
        
           | dragonwriter wrote:
           | But Chinchilla optimality, while an interesting result, is a
           | strange target for most practical purposes. Training is one
           | time, inference is many times; not training past the point
           | where its cheaper to training a larger model for the same
           | (proxy for) quality discounts to zero the import of the cost
           | of inference.
        
             | whimsicalism wrote:
             | Yep, but if stability has the goal of training the best
             | possible model then that would explain the choices they
             | made.
        
               | GaggiX wrote:
               | I mean 800B tokens on a 3B model and 7B model is still
               | way beyond the Chinchilla scale.
        
         | burtonator wrote:
         | Were you able to figure out if the RL models are going to be
         | jailed? A 65B parameter model could be a bit frightening.
         | That's 1/3rd the size of GPT3.
        
           | sebzim4500 wrote:
           | I'm sure there will be a bunch of different RL tuned versions
           | of them, RLHF isn't that expensive. IIRC Microsoft has
           | software that will do it for a few thousand dollars for a
           | model that size. I'm sure someone will release a non-
           | lobotomized version, maybe OpenAssistant.
        
           | kiraaa wrote:
           | its not alway about the size, but yeah its really good!
        
         | beecafe wrote:
         | [dead]
        
         | DustinBrett wrote:
         | I'm wondering what the sweet spot for parameters will be. Right
         | now it feels like the Mhz race we had back in the CPU days, but
         | 20 years later I am still using a 2-3GHz CPU.
        
           | lhl wrote:
           | I think "sweet spot" is going to depend on your task, but
           | here's a good recent paper that may give you some more
           | context on thinking about training and model sizes:
           | https://www.harmdevries.com/post/model-size-vs-compute-
           | overh...
           | 
           | There have also been quite a few developments on sparsity
           | lately. Here's a technique SparseGPT which suggests that you
           | can prune 50% of parameters with almost no loss in
           | performance for example: https://arxiv.org/abs/2301.00774
        
             | version_five wrote:
             | I was wondering if the longer training thing was a similar
             | phenomenon to the double-descent we see in other deep
             | learning models. Training for a really long time can
             | improve generalization (as can adding more parameters) -
             | but I don't know enough about LLM architecture to know if
             | that's relevant here. My skim of the blog post led me to
             | think it's proposing a different mechanism (scaling laws).
        
         | capableweb wrote:
         | > - 3B to 65B released or in progress
         | 
         | Seems they want to do 3B to 175B, although 175B is not in
         | progress yet.
        
           | ipsum2 wrote:
           | It's not efficient to do 175B. Training a smaller model (65B)
           | on more data gives better performance for the same compute.
        
             | tempaccount420 wrote:
             | If you want it to just regurgitate training data, sure. But
             | more parameters will always be better for more complex
             | tasks.
        
               | thewataccount wrote:
               | > But more parameters will always be better for more
               | complex tasks.
               | 
               | I think you should checkout this paper which discusses
               | the relationship of performance and the ratio of training
               | tokens to parameter count.
               | 
               | https://arxiv.org/abs/2203.15556
        
               | tempaccount420 wrote:
               | StableLM already has an optimal parameter number to
               | tokens ratio, so what's your point? They should train the
               | 65B model on even more tokens?
               | 
               | > StableLM is trained on a new experimental dataset built
               | on The Pile, but three times larger with 1.5 trillion
               | tokens of content
        
               | thewataccount wrote:
               | If I understand correctly, based on their prediction in
               | Table 3 on page 8, they do have enough tokens, but they
               | also need over a magnitude more compute time.
               | 
               | > It's not efficient to do 175B. Training a smaller model
               | (65B) on more data gives better performance for the same
               | compute.
               | 
               | This is OP's comment you replied to - so I was responding
               | under OP's context that the amount of compute time would
               | be the same, which I apologize I didn't make clear, and
               | my response was very poorly worded.
               | 
               | My intent was to link the paper because I think it
               | supports OP's statement that for the same amount of
               | compute time and a token ratio, the performance of a
               | smaller model will be better then a larger one (assuming
               | they haven't converged yet which they haven't at this
               | size).
               | 
               | > If you want it to just regurgitate training data, sure.
               | 
               | This paper was about showing Chinchilla performing with
               | models many times larger then itself, showing you don't
               | need to have a 175B size model for more performance then
               | "regurgitating training data"
        
             | sebzim4500 wrote:
             | Depends on your compute budget.
        
             | kiraaa wrote:
             | and also easy to deploy
        
         | GaggiX wrote:
         | >Small models only trained on 800B tokens
         | 
         | "These models will be trained on up to 1.5 trillion tokens." on
         | the Github repo.
         | 
         | https://github.com/stability-AI/stableLM/#stablelm-alpha
        
           | youssefabdelm wrote:
           | That's great news, but one would think that since they're
           | behind Stable Diffusion, that they'd use the insights behind
           | it and scale data even more than that to result in better
           | quality at a smaller scale model that can run on most
           | people's machines.
           | 
           | Like... try 10 trillion or 100 trillion tokens (although that
           | may be absurd, I never did the calculation), and a long
           | context on a 7B parameter model then see if that gets you
           | better results than a 30 or 65B parameter on 1.5 trillion
           | tokens.
           | 
           | A lot of these open source projects just seem to be trying to
           | follow and (poorly) reproduce OpenAI's breakthroughs instead
           | of trying to surpass them.
        
             | GaggiX wrote:
             | >try 10 trillion or 100 trillion tokens
             | 
             | Computation is not free and data is not infinite.
        
               | youssefabdelm wrote:
               | You could've said the same to OpenAI when they were
               | scaling GPT from 1 billion to 175 billion parameters.
               | We're all grateful they didn't follow that line of
               | thought.
               | 
               | But Stability does have access to a pretty big cluster,
               | so it's not paying cloud compute (I assume), so cost will
               | be less, and data of course is not infinite...never
               | stated that.
               | 
               | But considering 3.7 million videos are uploaded to
               | youtube everyday, 2 million scientific articles published
               | every year, yada yada...that argument falls apart.
               | 
               | At the very least implement spiral development... 1
               | trillion... 3 trillion... (oh it seems to be getting WAY
               | better! There seems to be a STEP CHANGE!)... 5
               | trillion... (holy shit this really works, lets keep
               | going)
        
               | dragonwriter wrote:
               | The training corpus is the problem. An extra trillion
               | tokens is (ballpark) an extra million KJV bibles worth of
               | text formatted for ingestion. And you probably picked all
               | of the low hanging fruit in terms of quality prior
               | vetting and being in a standard format for ingestion in
               | your first trillion tokens of training data.
        
             | dragonwriter wrote:
             | > Like... try 10 trillion or 100 trillion tokens (although
             | that may be absurd, I never did the calculation)
             | 
             | But where's the corpus supposed ro come from?
        
           | Taek wrote:
           | Devs confirmed that the small ones use 800B, 1.5T is for the
           | large ones
        
             | GaggiX wrote:
             | @thunderbird120 asked a Stability employee and say that the
             | plan is going to keep training the models up to 1.5T. So I
             | don't know where do you read this.
        
               | nickthegreek wrote:
               | https://github.com/Stability-AI/StableLM#stablelm-alpha
               | shows that the 3b and 7B had 800b training tokens.
        
               | Taek wrote:
               | That may be, but the weights you can download today were
               | trained on 800B
        
               | sroussey wrote:
               | I think they are "checkpoint" models in this case.
               | 
               | Will be fun to compare when completed!
        
               | GaggiX wrote:
               | yes of course that's why they use "will be trained" on
               | the GH repo.
        
       | IronWolve wrote:
       | Are the XL 1024x1024 enhanced models released for stable
       | diffusion yet? That looks really good from the testing.
        
       | nologic01 wrote:
       | Is there something like Moore's law for LLM's that will
       | eventually turn them into ubiquitus compute?
        
       | matthasfun wrote:
       | I've recently downloading diffusion bee to my Mac for image
       | generation with stable diffusion.
       | 
       | Excited for when I can do the same with the StableLM.
        
       | beecafe wrote:
       | [dead]
        
       | transcriptase wrote:
       | Missed opportunity for Stability to call themselves "ClosedAI"
       | then being the opposite of what the name suggests.
        
       | leaving wrote:
       | Well that was a trip.
       | 
       | I asked StableLM-Tuned-Alpha-7b to translate the Swedish phrase
       | "Skaplig bogpulka" into English (because ChatGPT and New Bing
       | refuse to do so as a result of their puritannical US censorship)
       | and it confabulated wildly.
       | 
       | It made up an Estonian folk song and somehow we ended up on the
       | etymology of the well-known Swedish word "Skeletor".
       | 
       | This one is not ready for prime-time, but I have hopes. Someone
       | please make a model that doesn't censor. I won't be paying one
       | thin dime for this stuff until it is censorship-free.
        
       | [deleted]
        
       | arpowers wrote:
       | Why does every LLM use "Hope this email finds you well" to start
       | virtually every letter and email?
        
         | ShamelessC wrote:
         | Have you tried something like:
         | 
         | "Never start an email with 'Hope this email finds you well'"
         | 
         | in your first prompt.
        
       | cmsj wrote:
       | For people wanting to run it locally, you can fit the 7b model
       | (just) into a 24GB VRAM GPU (e.g. 3090/4090). The 3b model
       | appears to be much more reasonable, but I would say the output
       | is.... of limited quality based on the few tests I've run thus
       | far.
        
         | cmsj wrote:
         | Gist that mostly just takes the notebook Stability AI have in
         | the github repo, and turns it into a script you can run locally
         | after installing a few dependencies from pip:
         | 
         | https://gist.github.com/cmsj/2d6b247ad4fc8f15011105feeda763e...
        
         | millimeterman wrote:
         | I suspect the community will start creating lower
         | precision/quantized versions of the model very quickly. LLaMa
         | 30b quantized to 4 bits is runnable on a 3090/4090.
        
         | janmo wrote:
         | Don't need a GPU to run the model, you can use your RAM and
         | CPU, but it might be a bit slow
        
           | cmsj wrote:
           | It's very slow, and for the 7b model you're still looking at
           | a pretty hefty RAM hit whether it's CPU or GPU. The model
           | download is something like 40GB.
        
       | nwoli wrote:
       | No way, I thought this was months away still. So excited
        
         | MacsHeadroom wrote:
         | Completed training is. These are early "Alpha" checkpoints.
        
         | visarga wrote:
         | It's released but still cooking in the oven as we speak.
        
       | msoad wrote:
       | I feel "The Pile" needs a lot more to match what OpenAI has
       | trained their models on. The mysterious "Books2" and so on.
        
       | nickthegreek wrote:
       | Great to see Stability release this with a great license as well.
       | Any idea on the usecases for the 3B model? Will a model that
       | small suffer heavily from a 4bit quantization?
        
         | wsgeorge wrote:
         | > Will a model that small suffer heavily from a 4bit
         | quantization?
         | 
         | From my experience with quantized 7B llama models, avoid 3B if
         | you can. Without benchmarks, I think this is a decent rule of
         | thumb.
        
         | capableweb wrote:
         | Generating either a 0 or 1 in response to something? Maybe
         | useful in some testing scenarios? Honestly don't know either.
         | New lorem ipsum?
        
           | visarga wrote:
           | You can use it as the assistant model to a large model, it's
           | called speculative sampling. You generate text with the small
           | model and validate with the large one, ensuring no deviation
           | occurs. Speedup of 2.5x
           | 
           | https://arxiv.org/abs/2302.01318 (DeepMind)
        
           | MacsHeadroom wrote:
           | That would be 1bit quantization. In reality quantization
           | under 8bits is done in smart ways which result in higher
           | effective output quantization and lower effective memory size
           | quantization.
           | 
           | For example, bucketing identical groups of weights and then
           | reusing one bucket for all the identical groups lowers the
           | effective bit quantization at the memory level while
           | retaining the uncompressed quantization quality.
           | 
           | There is literature on effective quantization levels below 1
           | ("So called 0bit quantization). But even then the actual
           | weights are typically 2-4 bits and there is just a lot of
           | reuse of weights going on.
           | 
           | Another neat trick is to line bins of weights up in order of
           | lowest to highest weights and compute a function to produce a
           | zero offset throughout the bins of weights; such that 0010 in
           | one bin and and 0010 in another bin are unsampled to 16bit
           | and then have different offsets added, maintaining
           | uncompressed 16bit performance without the memory overhead.
           | 
           | There are many more tricks like this and many still to be
           | found!
        
       | jerrygoyal wrote:
       | what's the cut-off knowledge for this?
        
       | alexb_ wrote:
       | Absolutely a giant fan of Stability staying to actually open
       | source licenses and not licenses that impose restrictions on what
       | you can use it for. This is the future of AI! Beware of any org
       | that uses "ethical" licenses - they are not open source.
       | Stability is one of the few organizations that actually cares
       | about free software, you love to see it.
        
         | cjbprime wrote:
         | > These fine-tuned models are intended for research use only
         | and are released under a noncommercial CC BY-NC-SA 4.0 license,
         | in-line with Stanford's Alpaca license.
         | 
         | This is a no-commercial-use-allowed license; it is neither
         | considered free software nor open source, the definitions of
         | which disallow restrictions on what you can use the work for.
        
           | renewiltord wrote:
           | Interesting. A non-LLM hallucinating. And to think we used to
           | believe that was only a property of LLMs.
        
             | circuit10 wrote:
             | Huh? Everything they said is true, isn't it?
        
               | renewiltord wrote:
               | StableLM, the model that's theirs is CC-BY-SA 4.0.
        
               | [deleted]
        
           | freedomben wrote:
           | The two sentences prior are important:
           | 
           | > _We are also releasing a set of research models that are
           | instruction fine-tuned. Initially, these fine-tuned models
           | will use a combination of five recent open-source datasets
           | for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT,
           | and HH. These fine-tuned models are intended for research use
           | only and are released under a noncommercial CC BY-NC-SA 4.0
           | license, in-line with Stanford's Alpaca license._
           | 
           | The snippet you quoted is not talking about the main model in
           | the announcement. It's talking about fine-tuned models based
           | on other models. Stability has to respect the license of the
           | originals. They cannot change it.
           | 
           | The main model is described higher up in the post and _is_
           | permissible for commercial:
           | 
           | > _Developers can freely inspect, use, and adapt our StableLM
           | base models for commercial or research purposes, subject to
           | the terms of the CC BY-SA-4.0 license_
        
             | btown wrote:
             | It also appears that CC BY-SA-4.0 is GPL-compatible. Not a
             | lawyer and this is not legal advice, but it certainly seems
             | like one could operate their own StableLM server/service
             | and allow proprietary code to use it over a network
             | interface, much like one could use a GPL-licensed database
             | system.
             | 
             | https://creativecommons.org/faq/#can-i-apply-a-creative-
             | comm...
        
           | seydor wrote:
           | Not their fault, the instruct-tuned models depend on non-open
           | data.... Which should be open however. Scraping chatGpt is
           | legal
        
             | sanxiyn wrote:
             | Agreed. Scraping ChatGPT is against OpenAI terms of use and
             | OpenAI is entitled to terminate your access immediately
             | upon notice, but since ChatGPT output is not copyrighted
             | (and copyrightable), output you acquired before termination
             | should be freely redistributable. I am not sure why
             | Stanford Alpaca authors think otherwise but they are wrong.
             | 
             | https://openai.com/policies/terms-of-use
             | 
             | Thank you for using OpenAI! These Terms of Use apply when
             | you use the services of OpenAI, L.L.C. (snip) By using our
             | Services, you agree to these Terms. (snip) You may not
             | (iii) use output from the Services to develop models that
             | compete with OpenAI. (snip) We may terminate these Terms
             | immediately upon notice to you if you materially breach
             | Sections 2 (Usage Requirements).
        
         | whywhywhywhy wrote:
         | "Ethics" will only ever be an excuse to lock this technology
         | behind one companies paywall. The only ethical AI is actually
         | free and open AI, how its trained is irrelevant imho as long as
         | we can all benefit. The negatives of the work of individuals
         | being used to train it outweigh the negatives of one company
         | just doing that and holding the power within their walls.
        
           | version_five wrote:
           | Yeah I wish there was more real investigation / analysis into
           | who is behind various "ethical AI" pushes and what they stand
           | to gain from it. From what I can see, many of the people
           | involved either are invested in companies that will somehow
           | certify your AI is ethical, or just want to stifle
           | competition so they can catch up. Of course there's also a
           | sprinkling of "current thing" supporters.
        
         | Xelynega wrote:
         | I have to disagree. Especially in the case of LLMs where new
         | API services are popping up all over the place, an "ethical"
         | license like agpl that requires the source be shared for web
         | services would would accelerate development of the space as a
         | whole immensely.
        
           | EamonnMR wrote:
           | I think when they say ethical they're talking about RAIL
           | which has clauses about not doing bad guy stuff with their
           | models, not AGPL.
        
         | shostack wrote:
         | How is this sort of thing audited? I imagine there are all
         | sorts of lifestyle AI businesses that won't give two shits
         | about a license where people can't easily see or audit what is
         | being used.
        
         | burtonator wrote:
         | I am definitely not in favor of OSS license for models >=
         | GPT3...
         | 
         | Not unless they're aligned well.
         | 
         | There are all sorts of horrible use cases that these could be
         | used for.
        
           | circuit10 wrote:
           | For the worst use cases, like scams, why would they respect
           | the license? Though that probably applies for some legal but
           | immoral use cases
        
           | cheald wrote:
           | "Alignment" is just a euphemism for "agrees with me", though.
           | Humans aren't even aligned with each other. Demanding that AI
           | models be "aligned" is essentially a demand that AI only be
           | produced which agrees with your priors.
        
           | risho wrote:
           | it is true that there are concerns relating to open source
           | and ai, but surely the having them be closed off, manipulated
           | and controlled untrustworthy corporations is worse.
        
             | wahnfrieden wrote:
             | they like the particular techbro that runs the corporation
             | currently. people will find all sorts of ways to lick the
             | boot of private ownership.
        
           | welshwelsh wrote:
           | The ideal alignment for a computer program is to be aligned
           | completely with the user, prioritizing their needs to the
           | exclusion of all others.
        
         | orra wrote:
         | I am very happy to see them use a true FLOSS licence. However,
         | it's a surprise to me, given Stable Diffusion is proprietary,
         | using one of those "ethical" licences.
        
       | r0l1 wrote:
       | How will they generate revenue?
        
       | earthboundkid wrote:
       | I tooted this yesterday:
       | 
       | Selling access to LLMs via remote APIs is the "stage plays on the
       | radio" stage of technological development. It makes no actual
       | sense; it's just what the business people are accustomed to. It's
       | not going to last very long. So much more value will be unlocked
       | by running them on device. People are going to look back at this
       | stage and laugh, like paying $5/month to a cellphone carrier for
       | Snake on a feature phone.
       | 
       | Web apps:
       | 
       | - Need data persistence. Distributed databases are really hard to
       | do.
       | 
       | - Often have network effects where the size of the network causes
       | natural monopoly feedback loops.
       | 
       | None of that applies to LLMs.
       | 
       | - Making one LLM is hard work and expensive. But once one exists
       | you can use it to make more relatively cheaply by generating
       | training data. And fine tuning is more reliable than one shot
       | learning.
       | 
       | - Someone has to pay the price of computation power. It's in the
       | interest of companies to make consumers pay for it up front in
       | the form of a device.
       | 
       | - Being local lets you respond faster and with access to more
       | user contextual data.
        
         | s3p wrote:
         | What does "stage plays on the radio" mean?
        
           | crucialfelix wrote:
           | When radio first started, people read plays written for the
           | stage, because that's what they knew and what they had. Later
           | people learned to write for the medium and make radio native
           | entertainment.
           | 
           | Same thing happened when TV arrived. They did live versions
           | of the radio entertainment on a set in front of a camera.
        
           | [deleted]
        
           | what_ever wrote:
           | Trying to be fancy to say things won't be running locally.
           | The orchestra/performers won't be live in front of you but on
           | the radio.
        
         | throwawayadvsec wrote:
         | I don't think it's going to happen in the next few years
         | 
         | the prices are gonna drop like hell, but ain't no way we run
         | models meant to run on 8 nvidia A100 on our smartphones in the
         | next 5 years
         | 
         | just like you don't store the entirety of spotify on your
         | iphone, you're not gonna run any decent LLM on phones any time
         | soon(and I don't consider any of the small Llamas to be decent)
        
           | w4ffl35 wrote:
           | > but ain't no way we run models meant to run on 8 nvidia
           | A100 on our smartphones in the next 5 years
           | 
           | m$ has been working on an AI chip since 2019 so i think we
           | will.
        
             | throwawayadvsec wrote:
             | An A100 is about the size of a brick, there is no way we're
             | fitting those 8 bricks in a phone in the next five years,
             | without even thinking about heat management
        
               | asciimike wrote:
               | An A100 HGX server is ~6kW of power consumption (and
               | associated heat), while an iPhone is O(1W). I agree that
               | a 6000x increase in energy density or 6000x decrease in
               | power consumption is unlikely in this decade.
        
             | d33 wrote:
             | They still can't beat the laws of physics. It's just the
             | kind of density that doesn't seem feasible to put in a
             | smartphone.
        
               | wewtyflakes wrote:
               | Human brains seem to be of the same order of magnitude in
               | terms of size, and seem more nuanced and capable than
               | GPT4.
        
             | what_ever wrote:
             | And? Nvidia has been working on AI chips for years.
        
           | davnicwil wrote:
           | the only thing I can say to this is that Apple have seemed
           | laser focused on tuning their silicon for ML crunching, that
           | that focus is clearly now going to be amped up further still,
           | and that in tandem the software itself will be tuned to Apple
           | silicon.
           | 
           | GPUs on the other hand are pretty general purpose. And 5
           | years on a focused superlinear ramp up is a long time, lots
           | can happen. I am not saying it's 100%, or even 80% likely.
           | It'll be super impressive if it happens, but I see it as well
           | within the realms of reason.
        
             | brucethemoose2 wrote:
             | The M2 and the 4090 are both very general purpose. In fact,
             | the 4090 allocates proportionally more silicon area to the
             | tensor cores than Apple allocates to the neural engine.
             | 
             | The M series is basically the only "big" SoC with a
             | functional, flexible NPU and big GPU right now, which is
             | why it seems so good at ML. But you can bet actual ML
             | focused designs are in the pipe.
        
               | zamnos wrote:
               | Well, and gobs of RAM, which only top end cards can
               | compete with.
        
             | tehsauce wrote:
             | Apple's new M2 Max has a neural engine which can do 15
             | trillion flops. Nvidias's A100 chip (released almost 3
             | years ago) can do 315 trillion flops. Apple is not going to
             | close this 20x gap in a few years.
        
               | moffkalast wrote:
               | > this 168x gap
               | 
               | FTFY, remember it takes 8 of those to even load the
               | thing. And when the average laptop has that much compute,
               | GPT 4 will seem like Cleverbot in comparison to the state
               | of the art.
        
               | sroussey wrote:
               | At some point, they will put the models in silicon. I'm
               | curious as to when... 5yr?
        
               | viraptor wrote:
               | That doesn't sound likely with the current architectures.
               | There may be some kind of specialisation, but NN is like
               | the chip design nightmare. We can't do chips that that
               | many crossed lines. It's going to have to keep the
               | storage+execution engine pattern unless we have done
               | breakthroughs.
               | 
               | "More specialised than GPU" is the game for now.
        
               | davnicwil wrote:
               | right, it's a huge challenge.
               | 
               | I think the tuning the models to the hardware piece is
               | important, and of course there is much more incentive to
               | do this for Apple than nvidia because of the distribution
               | and ecosystem advantages Apple have.
               | 
               | But also, I don't know... let's see what the curve looks
               | like! It's only been a couple of years of these neural
               | engines. Let's see how many flops M3 can hit this year.
               | And then m4 the next. Again, 5 years is a long time
               | actually when real improvement is happening. I am
               | optimistic.
        
             | refulgentis wrote:
             | Curious, why do you think that? My knowledge is limited to
             | marketing material and my M2 vs my 3090, and my conclusion
             | so far would be that's in every hardware makers marketing
             | claims the past couple years.
        
           | viraptor wrote:
           | > but ain't no way we run models meant to run on 8 nvidia
           | A100 on our smartphones in the next 5 years
           | 
           | When I leaned about neutral networks, the general advice at
           | the time was "you'll only need one hidden layer, with
           | somewhere between the number of your input and output
           | neurons". While that was more than 5 years ago, my point is -
           | both the approach and the architecture changes over time. I
           | would not bet on what we won't have in 5 years.
        
           | Analog24 wrote:
           | This is the reason why they're not going to move on device
           | anytime soon. You can use compression techniques, sure, but
           | you're not going to get anywhere near the level of
           | performance of GPT-4 at a size that can fit on most consumer
           | devices
        
             | mkaic wrote:
             | I think we'll see completely new architectures dominate in
             | the near future, ousting the transformer. I am strongly
             | suspicious that, while impressive, transformers use several
             | orders of magnitude more compute than is "needed" for the
             | tasks they perform--if for no other reason because the
             | human brain performs similarly and it only draws 20 watts!
             | And it isn't even an engineered system, jus the product of
             | a very, very long history of natural selection! I fully
             | anticipate that we'll see AI in the near future that
             | achieves human-level performance on sub-human power budgets
             | like the ones you'd be constrained by on a phone :)
        
               | Analog24 wrote:
               | "neat future" is very ambiguous. At the moment there is
               | nothing even close to transformers in terms of
               | performance. I suspect you are right in general but I'm
               | not sure about the "near future" part, there needs to be
               | a pretty significant paradigm shift for that to happen
               | (which is possible, of course, I just don't see any hints
               | of it yet).
        
           | brucethemoose2 wrote:
           | Vicuna (LLama) 13B is pretty good IMO. A 20B model can
           | definitely fit in RAM on future devices.
        
         | WhiteNoiz3 wrote:
         | I agree - I think for security and privacy we need it to be on-
         | device (either that or there needs to be end to end encryption
         | with gaurantees that data won't be captured for training).
         | There are tons of useful applications that require sensitive
         | personal information (or confidential business information) to
         | be passed in prompts - that becomes a non issue if you can run
         | it on device.
         | 
         | I think there will be a lot of incentive to figure out how to
         | make these models more efficient. Up until now, there's been no
         | incentive for the OpenAI's and the Googles of the world to make
         | the models efficient enough to run on consumer hardware. But
         | once we have open models and weights there will be tons of
         | people trying to get them running on consumer hardware.
         | 
         | I imagine something like an AI specific processor card that
         | just runs LLMs and costs < $3000 could be a new hardware
         | category in the next few years (personally I would pay for
         | that). Or, if apple were to start offering a GPT3.5+ level LLM
         | built in that runs well on M2 or M3 macs that would be strong
         | competition and a pretty big blow against the other tech
         | companies.
        
         | ForHackernews wrote:
         | ?
         | 
         | All software is sold as SaaS today, because it's more
         | profitable. The same will be true for LLMs.
        
         | coffeebeqn wrote:
         | LLMs also need persistent storage for best performance.
         | Otherwise you're always starting at square one
        
         | circuit10 wrote:
         | At least for now, good LLMs need GPUs that cost tens of
         | thousands, so a cloud API is the only reason option. It
         | definitely makes a lot of sense
        
         | andrewcamel wrote:
         | On this topic, Apple is the sleeping giant. Sleeping tortoise
         | maybe. Everyone else has been fast out of the gates, but Apple
         | has effectively already been positioning to leap frog everyone
         | after a decade+ of M1 chip design. Ever since these chips
         | launched, the M1 chips have felt materially underutilized,
         | particularly their GPU compute. Have to believe something big
         | is going on behind the scenes here.
         | 
         | That said, wouldn't be surprised if the truth was somewhere in
         | between cloud-deployed and locally deployed, particularly on
         | the way up to the asymptotic tail of the model performance
         | curve.
        
           | ohgodplsno wrote:
           | M1 GPUs are barely real-world tested, alright chips. They're
           | far from being a sleeping giant.
        
           | smoldesu wrote:
           | What would a "leap frog" look like, in your mind? I'm
           | struggling to imagine how they're better positioned than the
           | competition, especially after llama.cpp showed us that
           | inference acceleration works with everything from AVX2 to ARM
           | NEON. Compared to Nvidia (or even Microsoft and ONNX/OpenAI),
           | Apple is somewhat empty-handed here. They're not out of the
           | game, but I genuinely see no path for them to dominate
           | "everyone".
        
             | yunwal wrote:
             | My guess is a leapfrog would have more to do with how LLMs
             | are integrated into an operating system, rather than just
             | coming out with a better model. I don't think we're gonna
             | get a substantially more capable LLM than GPT-4 anytime
             | soon, but fine-tuning it to sit on top of the core of an
             | operating system could yield results.
        
               | smoldesu wrote:
               | Feels like Microsoft already beat them to the punch.
               | Their ONNX toolkit has better ARM optimization than
               | Apple's own Pytorch patches, and their collaboration with
               | OpenAI places them pretty far ahead of the research
               | curve. I'm convinced Microsoft could out-maneuver Apple
               | on local or remote AI functionality, if they wanted to.
        
         | jrm4 wrote:
         | I think it may be naive that people believe that the deciding
         | factor on how these things are used is likely to be "chip
         | speed." or "efficiency on the machine."
         | 
         | I wish we were in that world; but it more likely seems like it
         | would be "Which company jumps ahead quickest to get mindshare
         | on a popular AI related thing, and then is able to ride scale
         | to dominate the space?"
         | 
         | REALLY hope I end up being wrong here; the fact that so many
         | models are already out there does give me some hope.
        
         | GavinB wrote:
         | I'd suspect it will actually accelerate moving everything into
         | the cloud.
         | 
         | If your entire business is in the cloud, you can give an AI
         | access to everything with a single sign or some passwords. If
         | half is on the cloud and half is local, that's very annoying to
         | have all in-context for your AI assistant. And there's no way
         | we're getting everything locally stored again at this point!
        
           | bugglebeetle wrote:
           | Right, this is why StabilityAI is getting in bed with Amazon,
           | so private, fine-tuned models can operate on all your data
           | sitting out there in S3 buckets or whatever.
        
         | fshbbdssbbgdd wrote:
         | Most of your bullets sound like arguments that local models
         | will win.
         | 
         | > Often have network effects where the size of the network
         | causes natural monopoly feedback loops.
         | 
         | This one in particular sounds like an argument that remote
         | models will win.
        
           | max51 wrote:
           | I don't that's true in the context of businesses because they
           | won't want their data to be leaked and/or used for other
           | clients. The more data from your company you can feed the AI,
           | the more productive it will be for you. I'm not just talking
           | about semi-public documentation, but also things like emails,
           | meeting transcript, internal tools APIs, employee details,
           | etc.
           | 
           | If the AI service provider uses your data to help better
           | train their AI, it will be blacklisted by most companies. If
           | you keep them in silos, the centralisation will offer almost
           | no benefit while still being a very high privacy risk. The
           | only benefit they get is that it allows them to demo it and
           | see it's potential, but no serious business will adopt it
           | unless you also provide a self-hosted solution.
           | 
           | I think the only people who will truly benefit from using
           | cloud services as a long term solution are personal users and
           | companies too small to afford the initial cost of the
           | hardware.
        
             | goatlover wrote:
             | That seems hard to believe for businesses which already
             | rely on Office, Teams and Sharepoint, since Microsoft will
             | be making its version of ChatGPT available for all its
             | products, and the integration will be too hard to pass up
             | on.
        
               | max51 wrote:
               | Microsoft is in a different situation because everyone is
               | already forced to trust them with their OS and o365. For
               | better or for worse, there are no current alternatives to
               | Windows and the office suite for most businesses. If you
               | already login to your OS with a Microsoft account and
               | process your data in Excel, adding an AI tool on top of
               | it is not a big jump. Very few others are in this
               | situation.
               | 
               | For every other AI service providers, good fucking luck
               | getting clients to trust you. I expect we will see a lot
               | AI services that offer a cheap and easy to use cloud AI
               | subsidized by a very expensive self-hosted version. I
               | also expect a lot of data leaks and many high profile
               | incidents where an AI creates a document or code that
               | includes sensitive data from someone else (hard coded
               | passwords, API keys, etc.).
               | 
               | Even for a large company like Autodesk or Adobe, you
               | might trust them with your engineering drawings and your
               | new product design, but would you feel comfortable
               | uploading your code base for internal tools, employee
               | files, email communications, etc. to them? It's gonna be
               | a hard no for a lot of businesses
        
         | precompute wrote:
         | This is mostly why the future of computation only makes sense
         | monetarily if you have everyone shift to a thin client. So,
         | banning GPUs is likely considered a "necessary evil" by the
         | BigTech cognoscenti for accomplishing that goal.
        
         | qingdao99 wrote:
         | > But once one exists you can use it to make more relatively
         | cheaply by generating training data.
         | 
         | Is that a real technique? Why not just shrink down the model
         | itself directly somehow, is that not possible?
        
         | vsareto wrote:
         | I think you'll be paying a cloud provider instead of an API
         | company, but it'll still be a monthly service charge for the
         | service
         | 
         | I don't see small/medium companies getting into acquiring
         | hardware for AI
        
         | losvedir wrote:
         | This doesn't seem technically feasible to me. The state of the
         | art will for a long time require a lot more hardware to run
         | than it's available on a consumer device.
         | 
         | Beyond which, inference _also_ benefits from parallelization,
         | not just training, so being able to batch requests is a
         | benefit, and more likely when access is offered via an API.
        
           | FL33TW00D wrote:
           | This isn't correct. It's 100% feasible to run very capable
           | models on consumer devices.
           | 
           | I wrote up a feasibility investigation last year:
           | https://fleetwood.dev/posts/a-case-for-client-side-
           | machine-l...
        
             | crazygringo wrote:
             | But there's a big difference between running "very capable"
             | models that use most available resources and consume tons
             | of power and run slowly...
             | 
             | ...versus running the best models available, in a few
             | seconds, without using up the memory the main app you're
             | using needs for running.
             | 
             | These are all mainly going to be run remotely for general
             | consumer usage for quite a while I think.
        
               | FL33TW00D wrote:
               | How many applications on your machine right now are GPU
               | accelerated? There is a chunk of silicon that spends
               | almost all its time doing nothing.
        
             | moffkalast wrote:
             | > very capable models
             | 
             | Well that's the problem though, those models don't come any
             | close to being _useful_ at all. At least not yet. And they
             | also run much slower.
             | 
             | As compute increases in general, there will be larger and
             | more capable state of the art models and it'll make more
             | sense to just use those instead of trying to run some local
             | one that won't give you any useful answers. Data centers
             | will always have a few orders of magnitude more horsepower
             | than your average laptop, even with some kind of inference
             | accelerator card.
        
               | FL33TW00D wrote:
               | Stable diffusion isn't useful at all?
        
             | circuit10 wrote:
             | "very capable" and "state of the art" aren't the same
             | thing. In this case the gap is huge
        
               | FL33TW00D wrote:
               | This is true, but a good subset of tasks can be
               | accomplished by much smaller models. Copilot is 12B!
        
             | frozenlettuce wrote:
             | I ran alpaca 7B in a 2015 Acer laptop (it was slow, but
             | worked!)
        
           | brucethemoose2 wrote:
           | I can already run Vicuna(llama) 7B on my 2020, 14" PC laptop
           | at ~3.5 tokens/sec, and more speed can definitely be squeezed
           | out.
           | 
           | Most future laptops and phones will ship with NPUs next to
           | the CPU silicon. Once they get enabled in software, that
           | means a 16GB machine can run a 13B model, or a 7B model with
           | room for other heavy apps.
           | 
           | As for the benefits of batching and centralization, that is
           | true, but its somewhat countered by the high cost of server
           | accelerators and the high profit margins of cloud services.
        
         | burtonator wrote:
         | We're stuck here for a while due to the size, and cost, of the
         | larger models.
         | 
         | The main reason I want a non-cloud LLM is that I want one
         | that's unaligned.
         | 
         | I know I'm not a criminal and I want to stop being reprimanded
         | by GPT4.
         | 
         | What I'm most interested here is fine tuning the model with my
         | own content.
         | 
         | That could be super valuable especially if we could get it to
         | fact check itself, which you could with a vector database.
        
           | shostack wrote:
           | What's been so interesting with the explosion of this has
           | been how prominently the corporately-driven restrictions have
           | been highlighted in news and such.
           | 
           | People are getting a good look in very easy to understand
           | terms at the foundational stage at how limiting the future is
           | to have this just be another big tech controlled thing.
        
           | WhiteNoiz3 wrote:
           | They have said that the alignment actually hurts the
           | performance of the models. Plus for creative applications
           | like video games or novels, you need an unaligned model
           | otherwise it just produces "helpful" and nice characters.
        
         | skybrian wrote:
         | This is sort of like saying the world wide web is a fad. Many
         | people made that argument, but a lot of desktop apps got
         | replaced by websites even though they were supposedly inferior.
         | 
         | ChatGPT works fine as a website and you don't need to buy a new
         | computer to run it. You can access your chat history from any
         | device. For many purposes, the only real downside is the
         | subscription fee.
         | 
         | If LLM's become cheaper to run, websites will be cheaper to
         | run, and there will be lower-cost competition. Maybe even cheap
         | enough to give away for free and make money from advertising?
        
           | zirgs wrote:
           | The other downside starts with an "As an AI language
           | model..."
        
             | ModernMech wrote:
             | And ends with "...I'm afraid I can't do that."
        
         | lairv wrote:
         | This doesn't seem that obvious to me, serving LLMs through an
         | API allows to have highly optimized inference with stuff like
         | TensorRT and batched inference while you're stuck with batch
         | size = 1 when processing locally.
         | 
         | LLMs doesn't even require full real-time inference, there are
         | applications like VR or camera stuff where you need real-time
         | <10ms inference, but for any application of LLMs 200-500ms is
         | more than fine
         | 
         | For the users, running LLMs locally means more battery usage
         | and significant RAM usage. The only true advantage is privacy
         | but this isn't a selling point for most people
        
           | pornel wrote:
           | You're still thinking in terms of what APIs would be used
           | for, rather than what local computation enables.
           | 
           | For example, I'd like an AI to read everything I have on
           | screen, so that I can ask at any time "why is that? Explain!"
           | without having to copy paste the data and provide the whole
           | context to a Google-like app.
           | 
           | But without privacy guarantee (and I mean technical one, not
           | a pinky promise to be broken when VC funding runs out)
           | there's no way I'd feed _everything_ into an AI.
        
           | brucethemoose2 wrote:
           | We are very close to optimized ML frameworks on consumer
           | hardware.
           | 
           | And TBH most modern devices have way more RAM than they need,
           | and go to great lengths to just find stuff to do with it.
           | Hardware companies also very much like the idea of a heavy
           | consumer applications.
        
         | [deleted]
        
       | thatwasunusual wrote:
       | Is this a joke? I tried the HuggingFace demo, and it just spews
       | out ridiculous replies for everything.
       | 
       | Examples:
       | 
       | >> What can you tell me about Therese Johaug?
       | 
       | > The person you're asking about is Therese Johaug (The Judge) a
       | Norwegian jazz violinist. She played with many famous musicians,
       | including and especially Charles Mingus. She was born in 1954 in
       | the town of Kolsas. She is married to the violinist Mark
       | Sorensen.
       | 
       | Everything in that statement is wrong. She is a Norwegian cross-
       | country skier, born in 1988 in Dalsbygda, not married.
       | 
       | Ok. Let's try something easier.
       | 
       | >> What can you tell me about Edvard Munch?
       | 
       | > -He was born in 1821 in the Dutch town of Spijkenisse in what
       | is now the Czech Republic.
       | 
       | > -He studied art at the Academy of Fine Arts in Brussels,
       | Belgium, where he made a portrait of a young girl called Anna.
       | 
       | > -He then moved to Norway to practice with his friends and Dano-
       | Norwegian painter, Christian Krohg, the two of them known as "The
       | Scream" or "Kristine in the Black".
       | 
       | > -There he met the painter, Ambroise Thomas, who became a very
       | close friend of Munch's and would often take him to the beach to
       | watch the boats.
       | 
       | > -One day in 1883 Munch bought a house at a deserted place in
       | the outskirts of Oslo, in the southern region of Norway.
       | 
       | > -The house, called "La Chute", had no inhabitants at all, it
       | was just a huge pile of rocks, but this Munch loved it so much
       | that he named it "Lochte".
       | 
       | > -He took the name "Munch" because he thought it was German, but
       | it was totally French, he had the idea that he could paint the
       | ocean and the sky, so he named his new painting "Moulin Sur
       | l'Oiseau".
       | 
       | > -At the same time, he would soon start creating works of art
       | that would be strange and not make any sense, for instance, one
       | summer day in 1870 he painted what is known now as "The Scream",
       | it has many other paintings with the same title, "The Scream" is
       | a drawing of a young girl, she is holding an ax and is crying in
       | the scream position.
       | 
       | > -He had the idea of "The Scream" in one sitting in 1883, but it
       | took six months to paint.
       | 
       | > -He stayed in France in the studio of Christian Krohg when he
       | was still Edouard de Badeley, one of the greatest painters of his
       | time.
       | 
       | > -In 1891 he returned to Norway to France and lived there for
       | the rest of his life.
       | 
       | > -He died in Paris in 1900.
       | 
       | Christ on a bike.
       | 
       | My favourite must be that "The Scream is a drawing of a young
       | girl, she is holding an ax and is crying in the scream position."
        
         | tmikaeld wrote:
         | Agreed, it doesn't even do the basic stuff that GPT-2 did, how
         | can anyone be impressed by this?
        
         | franzypants wrote:
         | I get the sense that any accurate portrayal of the model is
         | getting downvoted. Sure, it's going to be a while before open-
         | source models catch up with OpenAI. I'm glad that Stability is
         | working on this, and I look forward to trying out their larger
         | models that they're training right now, but we should be able
         | to discuss accurately the capabilities of an LLM without just
         | automatically jumping on the hype train.
        
           | ShamelessC wrote:
           | That's fair, but this:
           | 
           | > Is this a joke? I tried the HuggingFace demo, and it just
           | spews out ridiculous replies for everything.
           | 
           | Is a shallow and uninformed criticism. The parameter count
           | alone should inform people that this isn't going to match
           | GPT-3.5 or GPT-4. Not everyone knows that of course, but
           | starting a comment with "Is this a joke?" is precisely the
           | wrong way to start such a discussion.
        
         | adrianmonk wrote:
         | Yeah, I've tried a few things, and so far I've gotten no
         | answers I'd consider acceptable. Here's are some examples.
         | 
         | -----
         | 
         | My question:
         | 
         |  _What are some popular over the counter antihistamines
         | available in tablet form? Please name three, and for each one,
         | give the generic and brand name, and NO OTHER INFORMATION AT
         | ALL._
         | 
         | Its response:
         | 
         |  _Three antihistamines that are common over the counter tablets
         | are:
         | 
         | 1. Claritest
         | 
         | 2. Zyrtec
         | 
         | 3. Marzolast
         | 
         | Brand names for these tablets are:
         | 
         | 1. Claritest - Contains diphenhydramine hydrochloride, which is
         | an expectorant and cold and allergy relief medicine.
         | 
         | 2. Zyrtec - Contains diphenhydramine hydrochloride, which is an
         | expectorant and antispasmodic medication for motion sickness.
         | 
         | 3. Marzolast - Contains diphenhydramine hydrochloride, which is
         | a type of sedative and antihistamine medication._
         | 
         | Things wrong with this answer:
         | 
         | * Claritest isn't an antihistamine. It's a non-invasive
         | prenatal DNA screening test.
         | (https://www.genpathdiagnostics.com/patients/womens-
         | health/ge...)
         | 
         | * Marzolast isn't an antihistamine. It isn't anything. It only
         | has one meaningful hit on Google, which is a coupon code for a
         | Udemy course.
         | 
         | * It says Zyrtec contains diphenhydramine hydrochloride, which
         | is an antihistamine, but the wrong one. Zyrtec contains
         | cetirizine.
         | 
         | * It gives three different, conflicting explanations for what
         | diphenhydramine is. The last one is right, and the first two
         | are totally wrong.
         | 
         | * I asked it (pretty clearly!) to give no other information,
         | but it gave a lot.
         | 
         | -----
         | 
         | My question:
         | 
         |  _What film made red staplers famous?_
         | 
         | Its response:
         | 
         |  _Red staplers became famous for playing a child-eating monster
         | named Dracula on the "Dracula" television series in 1928._
         | 
         | Things wrong with this answer:
         | 
         | * Television series in 1928?! There basically weren't any
         | except an experimental one called The Queen's Messenger.
         | 
         | * Red staplers isn't an actor. There's no playing anybody.
         | 
         | * I'm pretty sure Dracula doesn't eat children. Wrong type of
         | monster.
         | 
         | -----
         | 
         | My question:
         | 
         |  _What's the difference between a fork and a spoon?_
         | 
         | Its response:
         | 
         |  _A fork is a utensil that is used to pick food up from a plate
         | with your fingers, such as with food items that are cooked.
         | Fingers are made of many different parts, including bones,
         | muscles, and tendons. The two most basic kinds of utensils are
         | spoons and forks. Forking is typically done by the dining
         | creature's hands and includes handling both the bread and the
         | fish._
         | 
         | Things wrong with this answer: I'm not even going to bother
         | this time.
        
       | sroussey wrote:
       | "also fine-tuned the model with Stanford Alpaca's procedure using
       | a combination of five recent datasets for conversational agents:
       | Stanford's Alpaca, Nomic-AI's gpt4all, RyokoAI's ShareGPT52K
       | datasets, Databricks labs' Dolly, and Anthropic's HH. We will be
       | releasing these models as StableLM-Tuned-Alpha."
       | 
       | They have released the 3B and 7B of both the base and instruction
       | tuned models. 30B and 65B in training and released later.
        
       | dmm wrote:
       | Issue to add support for these models to llama.cpp:
       | https://github.com/ggerganov/llama.cpp/issues/1063
        
       | aketchum wrote:
       | Are they releasing the weights for download? The links to
       | HuggingFace in the readme are giving me 404. This dataset they
       | built on-top of "The Pile" sounds interesting - looking forward
       | to evaluating their claim that 3-7 billion param models can
       | perform on par with 175 billion param GPT-3
        
         | artyomxyz wrote:
         | It looks like it's available here
         | https://huggingface.co/stabilityai/stablelm-base-alpha-7b/tr...
        
         | turingfeel wrote:
         | Did they claim this? I didn't see that claim made in the above
         | post.
        
           | aketchum wrote:
           | "The richness of this dataset gives StableLM surprisingly
           | high performance in conversational and coding tasks, despite
           | its small size of 3 to 7 billion parameters (by comparison,
           | GPT-3 has 175 billion parameters)."
           | 
           | So they did not explicitly say it is comparable, but
           | implicitly compared the two. I'm curious to evaluate what
           | "surprisingly high performance" means exactly.
        
       | mightytravels wrote:
       | I love the installation of this! Just open a Python window and it
       | downloads whatever it needs. Well done!
        
       | dr_dshiv wrote:
       | How can one get started adding RLHF to something like this?
        
         | coolspot wrote:
         | Open-assistant.io
        
         | seydor wrote:
         | Pretty soon there are going to be desktop apps for that
        
       | jonplackett wrote:
       | Anyone know if this will be added to the Stability API?
        
       | lxe wrote:
       | The model is very good. Looks like it's using GPT-NeoX structure,
       | and the finetuned 7b version is very quick, and beats everything
       | else in the category (vicuna, oasst, openchatkit, gpt4all,
       | etc...) in my "write a poem about cheese" benchmarks.
        
         | sebzim4500 wrote:
         | I like how 'fun' benchmarks like yours seem to be so much more
         | effective at judging model performance than all the traditional
         | NLP benchmarks.
         | 
         | I still think the best way to compare too models is to simulate
         | a rap battle between them, then it's immediately obvious who
         | wins.
        
           | coolspot wrote:
           | Yep, rap battle is how I knew Bard is nowhere close to
           | ChatGPT.
           | 
           | In the past whole world was watching Kasparov vs Deep Blue .
           | This time we will do Eminem vs LLM.
           | 
           | What a time to be alive!
        
       | SeanAnderson wrote:
       | I'm curious if this will give better results than llama 7B? Llama
       | 7B felt like a toy that, while cool to be able to run locally,
       | did not feel useful in any way when contrasted to the state of
       | GPT. Here's hoping for better and/or release of larger parameter
       | models with low performance requirements soon :)
       | 
       | EDIT: my first question times out when ran online, seems like
       | huggingface is getting hugged to death.
        
         | nwoli wrote:
         | They have on their team people from Anthropic which have stuff
         | like Claude Instant which is likely running a very light model,
         | ie the tricks from Anthropic have likely been incorporated into
         | the model they're presenting here
        
         | HarHarVeryFunny wrote:
         | If you're comparing to ChatGPT performance then Vicuna 13B
         | would be a best comparison point for something Llama-based.
        
           | bugglebeetle wrote:
           | Vicuna 13B performance is an order of magnitude below ChatGPT
           | for all but gimmicky conversational stuff. Try giving both
           | somewhat large, task-based prompts with steps and see what
           | happens.
        
             | HarHarVeryFunny wrote:
             | Maybe I should have phrased that better! I didn't mean that
             | Vicuna was comparable to ChatGPT, just that it's the best
             | Llama-based comparison you can make (since it's at least
             | been conversationally trained).
        
             | dragonwriter wrote:
             | > Vicuna 13B performance is an order of magnitude below
             | ChatGPT for all but gimmicky conversational stuff.
             | 
             | Until you connect it to external resources, I tend to think
             | of anything you do with "brain-in-a-jar" isolated ChatGPT
             | as gimmicky conversational stuff.
        
               | goatlover wrote:
               | ChatGPT is still going to be way more capable when you
               | use it's API to connect to external resources.
        
           | levesque wrote:
           | Isn't ChatGPT a 165B parameter model?
        
             | sanxiyn wrote:
             | No. OpenAI haven't disclosed parameter count of GPT-3.5 or
             | GPT-4, which are models used by ChatGPT. You may be
             | thinking of GPT-3, which is indeed a 175B parameter model.
        
               | levesque wrote:
               | Ah, interesting. Thought GPT-3.5 had the same structure
               | as GPT-3, for some reason. GPT-4 would obviously be
               | different.
        
         | pksebben wrote:
         | Even if it doesn't initially, the fact that it's being released
         | so permissively is massive - stable diffusion was made far more
         | powerful by being hackable at all levels and I can't imagine we
         | won't see the same here.
         | 
         | I imagine things like control nets that restrict output to
         | parsable types, LoRa style adaptations that allow mixable
         | "attitudes", that sort of thing.
         | 
         | Very different underlying architecture from diffusers, ofc. But
         | the action of open source is the same - a million monkeys with
         | a million xterms and so forth.
        
           | edkennedy wrote:
           | I'm really hoping for the ability to load in different sets
           | of trained material as embeddings/textual inversions like in
           | Stable Diffusion. Imagine scanning in some of your favorite
           | philosophy and design books and throwing them with small
           | weighting as a little flavor to your answer. The crossovers
           | between LLM and Stable Diffusion type models (like Loras) is
           | such a fascinating space to explore.
        
             | cheald wrote:
             | This has been around for GPT models for a while in the form
             | of "soft prompts", which are rather approximate to textual
             | inversion in the Stable Diffusion space.
        
             | alex_sf wrote:
             | This already exists for LLaMA, I'm sure it'll be
             | implemented for StableLM as well.
             | 
             | https://github.com/lxe/simple-llm-finetuner
        
       | anononaut wrote:
       | My prayers have been answered
        
       | IceWreck wrote:
       | StabilityAI is the real OpenAI. Thanks for this.
        
         | diminish wrote:
         | Well said. Openai is a promise unkept. Thanks StabilityAI for
         | existing.
        
           | nashashmi wrote:
           | when has opensource ever spearheaded independent innovation?
           | they usually follow along.
           | 
           | Fred Wilson once did a take on all trends in SV. First some
           | firm comes out with a product that changes the landscape and
           | makes a massive profit. Then some little firm comes along and
           | does the same for a cheaper price. Then some ambitious group
           | out of college comes out with an open-source version of the
           | same.
           | 
           | Open source has never been a trailblazer of innovation. Open
           | "research" was the original mantra for open ai. And an
           | entrepreneur in residence put together a great product. If
           | they were any more open, it would not make sense.
        
             | bugglebeetle wrote:
             | > Open source has never been a trailblazer of innovation.
             | 
             | Except for, you know, all the major programming languages
             | and Linux, which make all that innovation possible in the
             | first place. Also, everything OpenAI is doing is based on
             | open source stuff from Google and others, so...
        
               | nashashmi wrote:
               | True. Should have added open research has given lots of
               | puzzle pieces for innovation.
               | 
               | And open source products has led to many individual
               | contributions.
               | 
               | But again it's never been a trailblazer for innovation.
        
             | sainez wrote:
             | This could not be more untrue.
             | 
             | The world is littered with businesses that operate as
             | commercial wrappers around open source technology. Ever
             | heard of GitHub? What about MacOS? AWS?
        
           | capableweb wrote:
           | Unfortunately, due to the law of names, StabilityAI will in
           | the future hit the same issue as OpenAI and do a 180,
           | unleashing very unstable AI to the world.
        
             | mike_hearn wrote:
             | More like Stability will turn out to be an unstable
             | company. Last we heard they were struggling to raise more
             | funding and might lose their CEO due to unclear business
             | models:
             | 
             | https://www.semafor.com/article/04/07/2023/stability-ai-
             | is-o...
        
               | paxys wrote:
               | The company can cease operations tomorrow, but the model
               | they open sourced (and all of its derivatives built by
               | the community) will continue to exist. If OpenAI
               | disappears then all of the work they have done goes with
               | it.
        
             | IceWreck wrote:
             | I'd rather have that than closed AI models controlled by
             | afew corporations
        
             | seydor wrote:
             | then they both have ironic names now
        
         | jacooper wrote:
         | Not fully. The instruction tuned model is CC-BY-NC-SA
        
           | alex_sf wrote:
           | That's a limitation of the dataset used for that particular
           | tuned model. Probably not a great choice on their part given
           | that people aren't reading past the headline, but the actual
           | base model is not restricted.
        
           | dragonwriter wrote:
           | Its CC-BY-NC-SA because of the upstream sources used for
           | instruction training. There's open resources being developed
           | for that that I've seen, but probably nothing ready.
        
       | Kelamir wrote:
       | What does this mean in context of https://futureoflife.org/open-
       | letter/pause-giant-ai-experime... , where CEO of Stability AI had
       | left a signature on the petition(it's 8th in the list)? Do they
       | want time to catch up?
       | 
       | Regardless of that, I'm glad that StabilityAI enters the field as
       | well and releases models for public use.
        
         | sanxiyn wrote:
         | It doesn't mean anything. The letter says:
         | 
         | > Therefore, we call on all AI labs to immediately pause for at
         | least 6 months the training of AI systems more powerful than
         | GPT-4.
         | 
         | StableLM is not an AI system more powerful than GPT-4, so the
         | pause does not apply.
        
           | samstave wrote:
           | To whom, specificlally does the "pause apply"
           | 
           | Because, I can tell you that no state-agent is going to
           | pause, but amplify.
           | 
           | Israel, Iran, China, Russia and any self-respecting NATO
           | country is secretly pushing their AI as fast as fn possible.
           | 
           | You think the US is pausing anything with a 1 trillion dollar
           | defense budget, especially when this AI has surfaced?
           | 
           | The NSA has been projecting these capabilities forever....
           | 
           | Look at the movie "enemy of the state" as a documentary on
           | capabilities as early as 1998... now look at the fractal
           | spiral that we are witness (and victim) of.
        
             | moonchrome wrote:
             | All these state actor stories imply that state actors are
             | anywhere close to GPT4 levels. Do you have any evidence for
             | that ?
        
               | samstave wrote:
               | >> _Do you have any evidence for tha_
               | 
               | I do not, yet I am a _SUPER SKEPTIC_ -- > means I am a
               | conspiracy weirdo that doesnt believe a gosh darn thing
               | any government says, but I am also a technologist who is
               | not ignorant to things which have been built in secrecy.
               | 
               | Thus ;; I summize that some crazy shit is going on with
               | AI behind the scenes that we are not privy to -- and if
               | one persons reality of _" you cannot believe that _they*
               | are doing anything with AI that we dont know about"* ...
               | to paraphrase a few "A nuke is literally about to fall on
               | our heads"
               | 
               | --
               | 
               | We are moments away from realizing that it ALREADY
               | happened....
        
             | goatlover wrote:
             | As far as we know, OpenAI has the most advanced LLM in the
             | world right now. Just because governments can fund all
             | sorts of expensive military contracts or spying networks
             | doesn't mean they have the cutting edge AI research. Lot of
             | times it's companies that make the breakthroughs in tech.
        
               | samstave wrote:
               | Just so I understand how to come to that conclusion ; how
               | did you come to that conclusion?
               | 
               | Do we have _any_ transparent measure?
        
         | AppleBananaPie wrote:
         | I'd love to hear from someone knowledgeable in the area but my
         | gut instinct was the majority of people / organizations signing
         | on to that 'pause' just wanted to catch up.
        
           | sebzim4500 wrote:
           | Either that or the guy didn't actually sign it. It's not like
           | anyone actually checked who it was.
        
             | timestretch wrote:
             | He's mentioned signing it in interviews. He discusses it
             | here: https://youtu.be/SKoYhcC3HrM?t=1344
        
       | huseyinkeles wrote:
       | This is amazing. They even let the developers use it for
       | commercial purposes;
       | 
       | "Developers can freely inspect, use, and adapt our StableLM base
       | models for commercial or research purposes, subject to the terms
       | of the CC BY-SA-4.0 license."
       | 
       | You can use this link to interact with the 7B model;
       | 
       | https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp...
        
         | IceWreck wrote:
         | Is there a way to check the queue for this ?
         | 
         | I sent it one small text (actually a task) five minutes ago.
         | Its still loading.
        
           | pksebben wrote:
           | same. Probably hugged to death.
        
             | convery wrote:
             | It just finished building.
        
               | IceWreck wrote:
               | Just tested it. I tried to get it to write a conclusion
               | after giving it my report and while is was correct, it
               | was kind of silly, a far cry from chatgpt. But again,
               | this is the 7B variant and its open source.
        
       | ilaksh wrote:
       | Amazing but the instruction fine-tuning is still a huge challenge
       | for businesses since what is released cannot be used for
       | commercial purposes. Instructions are much more useful.
       | 
       | I have a feeling that there are probably some people who will
       | look at the "commercial okay" license for the first part and in
       | their mind that will somehow make it okay to use the instruction-
       | tuned ones for commercial purposes.
       | 
       | Maybe we don't really need Instruct stuff? Because it seems like
       | its a huge amount of redoing work. I wonder if the OpenAssistant
       | people will start building off of these models.
        
         | dougmwne wrote:
         | The instruct tuning can be done with several open datasets at
         | minimal cost. Should be easy for someone to create their own
         | open model.
        
           | jacooper wrote:
           | How?
        
             | dougmwne wrote:
             | https://github.com/tatsu-lab/stanford_alpaca
        
       ___________________________________________________________________
       (page generated 2023-04-19 23:00 UTC)