[HN Gopher] Alpaca: A strong open-source instruction-following m... ___________________________________________________________________ Alpaca: A strong open-source instruction-following model Author : jcklie Score : 212 points Date : 2023-03-13 15:01 UTC (7 hours ago) (HTM) web link (crfm.stanford.edu) (TXT) w3m dump (crfm.stanford.edu) | marban wrote: | Will it run Doom => Will it run LLaMA | all2 wrote: | Will LLaMA run DOOM. | simonw wrote: | This is why I think we're seeing a Stable Diffusion moment for | LLMs: https://simonwillison.net/2023/Mar/11/llama/ | | Look at the timeline: | | 24th February 2023: LLaMA is announced, starts being shared with | academic partners: | https://research.facebook.com/publications/llama-open-and-ef... | | 2nd March: Someone posts a PR with a BitTorrent link to the | models: https://github.com/facebookresearch/llama/pull/73 | | 10th March: First commit to llama.cpp by Georgi Gerganov: | https://github.com/ggerganov/llama.cpp/commit/26c084662903dd... | | 11th March: llama.cpp now runs the 7B model on a 4GB RaspberryPi: | https://twitter.com/miolini/status/1634982361757790209 | | 12th March: npx dalai llama: | https://cocktailpeanut.github.io/dalai/ | | 13th March (today): llama.cpp on a Pixel 6 phone: | https://twitter.com/thiteanish/status/1635188333705043969 | | And now, Alpaca. It's not even lunchtime yet! | | Turned this into a blog post: | https://simonwillison.net/2023/Mar/13/alpaca/ | 2bitencryption wrote: | Here is one question I have not seen answered yet: | | All the magic of "7B LLaMA running on a potato" seems to | involve lowering precision down to f16 and then further | quantizing to int4. | | Clearly this quantized model still outputs something resembling | human language, at the very least. | | But I haven't seen anyone show what effect this quantizing has | on the quality of the output. If the quality of the output is | bad, it's unclear if it's because the model needs to be | finetuned (as Stanford did here) or if it's because the | quanitizing reduced the quality, or both. | | If this fine-tuned Stanford model still has excellent output | after quantizing it to run on a Raspberry Pi 4GB, that would be | _awesome_! | gpm wrote: | > All the magic of "7B LLaMA running on a potato" seems to | involve lowering precision down to f16 | | LLaMa weights are f16s to start out with, no lowering | necessary to get to there. | | You can stream weights from RAM to the GPU pretty | efficiently. If you have >= 32GB ram and >=2GB vram my code | here should work for you: | https://github.com/gmorenz/llama/tree/gpu_offload | | There's probably a cleaner version of it somewhere else. | Really you should only need >= 16 GB ram, but the (meta | provided) code to load the initial weights is completely | unnecessarily making two copies of the weights in RAM | simultaneously. You could also lower vram requirements a bit | more with a bit more work (I just made the smallest change | possible to make it work) | gamegoblin wrote: | Decrease in accuracy is negligible and _decreases_ as model | size increases. That is, larger models quantize even better | than smaller models. | | https://arxiv.org/abs/2210.17323 | [deleted] | byteknight wrote: | Is this because averages are weighed less (less sensitive) | as the total sample size increases? | londons_explore wrote: | Yes. In a dense everything to everything neural network | layer, the number of 'inputs' to a node is proportional | to the square root of the number of weights. | | Therefore, assuming quantization noise is uncorrelated, | as the number of weights doubles, the number of inputs | goes up by sqrt(2), and the (normalized) noise goes down | by a factor of 2*(sqrt(2)). | | So, as a rule of thumb, you can remove 1 bit of precision | of the weights for every 4x increase in the number of | weights. | | All this assumes weights and activations are uncorrelated | random variables - which may not hold true. | londons_explore wrote: | Something is wrong with this math... by your logic I | could scale the network up big enough that I could | quantize the weights down to zero bits... | Robotbeat wrote: | Rules of thumb typically are just first order | approximations which by definition are not guaranteed to | hold far from their point of interest (or point of | tangency). | sebzim4500 wrote: | Having fewer than 1 bit per weight is not absurd. E.g. | you can use 2 bits to represent 3 'weights' if you insist | that at most one of the weights is allowed to exist. If | you try to order nodes so that adjacent nodes are | uncorrelated the performance loss might be manageable. | | People are already doing stuff like this (see | sparsification) so it is conceivable to me that this is | just what networks will look like in a few years. | dmw_ng wrote: | For 10 billion+ parameter models, the effects of quantization | are relatively small, for smaller models like Llama 7B the | effect becomes more dramatic, but there is ongoing research | on new quantization methods (like GPTQ) that preserve | significant performance even on the lower end. | | Quantization isn't the only technique available for | downsizing a model, Llama itself is already the result of | sizing the model and input data according to "Chinchilla | optimality", a very recent (as in 2022) result that e.g. | GPT-3 predates. The result is that Llama-13B performs in | benchmarks similarly with GPT-3 175B despite the tremendous | size difference. There are separately also a variety of | pruning methods to further eliminate inactive weights present | in the trained model (I think this is also active research) | | Finally even on something like a Raspberry Pi, | implementations for inference (like llama.cpp) are nowhere | near mature yet. There are already a multitude of runtimes | available for inference making large tradeoffs between | performance and flexibility (e.g. many models running on | PyTorch vs ONNX report 5-10x speedups running under ONNX) | | I think the really exciting part of Alpaca is the size and | budget of the team - 5 students with $100 scraping OpenAI put | this model together in a couple of hours of training. Any | notions of premium persisting in the AI space for much longer | seem fantastic at best, for all intents and purposes it has | already been commoditized. And that's scary considering the | size of the dent ChatGPT has put in my Google traffic | koheripbal wrote: | The difference is small, UNTIL you get to 4 bit | quantization, where the model is noticeably dumber. | | 8 bits, imo, is the minimum. | ulnarkressty wrote: | I'm excited to see what the OpenAssistant crowd does with these | models, they seem to have gathered the dataset to finetune | them. | | Lots of people use these models as talk therapy. We really need | 1) standalone options, 2) reproducible weights with crowd | sourced datasets to reduce biases (or at least know who you're | talking to). | nwoli wrote: | A lot of them aren't very good though at the same stable | diffusion vram level unfortunately (and we've had large non | consumer level gpu llms open sourced for a while eg gpt-j) | chaxor wrote: | That is likely because "good" is a higher bar in language | than images, because people don't mind or notice the longer | range artifacts in image models as much. | loufe wrote: | Might I suggest looking the story between the 2nd and 10th of | march? I've noticed Hacker News hasn't been following certain | areas of the effort. A lot of great work had happened and | continues to be happen in close conjunction with the text- | generation-webui (seriously, most of the cutting edge with | 4-bit GPTQ etc. has been closely tied to the project). | | >https://github.com/oobabooga/text-generation-webui/ | simonw wrote: | Wow, yeah that's a VERY active project: | https://github.com/oobabooga/text-generation- | webui/graphs/co... - only started Dec 18, 2022 and already 22 | contributors and 806 commits! | atleastoptimal wrote: | Question: what percentage of the hype and momentum for this is | so people can run sex chatbots on their local machine? | sebzim4500 wrote: | A lower portion than the equivalent number for Stable | Diffusion, but still significant. | wkat4242 wrote: | I know, this is crazy!! | | I can't fathom how development has suddenly seemed to | accelerate. | all2 wrote: | The timing of the Facebook leak seems suspect. | redox99 wrote: | They used OpenAI davinci to generate the dataset for instruction | tuning. They mention one of the reasons this model is | noncommercial is because OpenAI's terms of use prohibit | developing models that compete with OpenAI. | | They also published the generated dataset. | | If one were to use this dataset to tune your own commercial | model, would this be OK, or illegal? I would guess OpenAIs ToS no | longer applies, as this is now Alpaca's dataset. You are never | agreeing to OpenAIs ToS. | scottlawson wrote: | This model is for non commercial use only so it's unclear to me | how it could compete commercially with OpenAI. Sure, you could | decide to interpret "compete" in a way that makes it seem like | this violates TOS, but the mom commercial restriction is a | powerful counterargument too. | wongarsu wrote: | I think it's a reasonable interpretation that Alpaca developers | hold copyright over the dataset and can distribute it however | they want, just how you hold copyright for any pictures you | make with a camera. OpenAI can go after the Alpaca developers | for the ToS violation, maybe force them to stop distributing | the dataset on those grounds, but I don't see how that would | affect third parties using that dataset. | freedmand wrote: | The instruction tuning dataset is only 52,000 rows. It | shouldn't be too hard to crowdsource high-quality human answers | to this many questions and retrain the model, nixing the | dependency on OpenAI. | Tiberium wrote: | Such a thing already exists and there were some results - | https://open-assistant.io | | I'm not sure why the authors of Alpaca didn't try to train it | on this dataset. | IanCal wrote: | That dataset isn't released yet. | | > Can I download the data? | | > You will be able to, under CC BY 4.0, but it's not | released yet. We want to remove spam and PII before | releasing it. | losteric wrote: | There's the OIG dataset (https://laion.ai/blog/oig- | dataset/) which was used to train a NeoX 20B ChatBot | (https://huggingface.co/togethercomputer/GPT-NeoXT-Chat- | Base-...). The dataset is larger and publicly available. I | want to try finetuning LLaMa on this tonight. | [deleted] | ilaksh wrote: | Wow.. I really hope someone will train this model with that | dataset. Or maybe open assistant will pick it up. The | results looks so promising. | unshavedyak wrote: | I absolutely love the progress taking place here. Gives me hope | for being able to run a helpful assistant on my local. Feeding it | all sorts of information and getting useful features without | leaking anything to 3rd parties. | avmich wrote: | With so much of texts being written by computers, can we have | help on the reading side too? My emails are checked for spam and | filtered, ok, but I'd still like to learn what are the important | parts of the messages which pass through. Do we now have | solutions for that already? | simonw wrote: | Language models like LLaMA (and GPT-3, ChatGPT etc) are really | good at exactly this. They can summarize text, they can extract | key points into bullets, they can figure out common topics and | so on. | dougmwne wrote: | Waiting to be able to feed a LLM a transcript of an ongoing | meeting, then ping me when my attention is needed with the | context summarized. | eternalban wrote: | Self Instruct: | | https://arxiv.org/pdf/2212.10560.pdf | | https://github.com/yizhongw/self-instruct | YeGoblynQueenne wrote: | >> Responses on par with txt-DaVinci-3 | | What is "on par" and how is it measured? | sebzim4500 wrote: | >We performed a blind pairwise comparison between text- | davinci-003 and Alpaca 7B, and we found that these two models | have very similar performance: Alpaca wins 90 versus 89 | comparisons against text-davinci-003. | | Obviously not a completely foolproof comparison but it is at | least clear it isn't much worse for the types of prompt they | were testing. | og_kalu wrote: | They just compared how the responses were rated by human | evaluators. No benchmarks | margorczynski wrote: | > Importantly, we have not yet fine-tuned the Alpaca model to be | safe and harmless. We thus encourage users to be cautious when | interacting with Alpaca, and to report any concerning behavior to | help improve the safety and ethical considerations of the model. | | Thanks but I would prefer you don't start injecting your | "safeness" to gimp it. Or at least provide it as an alternative. | Traubenfuchs wrote: | [flagged] | temp_account_32 wrote: | What if you accidentally generate a racial slur offline on | your PC? Who will protect you then? | Traubenfuchs wrote: | [flagged] | beepbooptheory wrote: | Its sarcastic but how are you literally not that person | right now? Being angry at an invisible evil, not once | opening your eyes to any other thought, totally incapable | of making any real point except to whine... It's quite a | good descriptor actually! | | I hope you can find a model for yourself that will assure | you there are only two genders. Just to save the rest us | from this childishness. | dang wrote: | You've repeatedly posted flamewar comments in this | thread. Please don't do that. It's not what this site is | for, and destroys what it is for. | | See also https://news.ycombinator.com/item?id=35141682. | Traubenfuchs wrote: | That's like telling someone who uses violence in self | defense that he is now just as bad as the attackers he | defended himself against. Like telling someone who shouts | at people to turn down their music that, gotcha, now you | were shouting too. | ben_w wrote: | Except you, in this thread, are currently acting like the | one with the loud music that someone else told you to | turn down because they find it obnoxious, and you're | sarcastically moaning about them with mimicry. | dang wrote: | Please don't take HN threads further into ideological | flamewar. It's not what this site is for, and we have to | ban accounts that do it repeatedly. | | If you'd please review | https://news.ycombinator.com/newsguidelines.html and | stick to the rules when posting here, we'd appreciate it. | drusepth wrote: | There is a victim in those two analogies. Contrary to | what one may think, not being able to generate hateful or | "unsafe" text with AI does not make one a victim. | mlyle wrote: | On the other hand, LLM's without some degree of safety work | will frequently advocate for genocide/eugenics; perpetuate | terrible stereotypes; mock ethnic dialects of language; veer | randomly into erotica; etc. Not things in "trigger warning" | territory but stuff that is broadly considered unacceptable. | Traubenfuchs wrote: | In private, you can ask it not to produce this kind of | content. | | If you offer a public API it's your responsibility to | restrain the LLM or do an automated acceptability analysis | before publishing content. | | But the raw, open source code should not be constrained, | castrated and sterilized. | mlyle wrote: | > But the raw, open source code should not be | constrained, castrated and sterilized. | | Which is what we have now. But they are going to fine- | tune it so that we can use it for various purposes | without worrying too much it will go on a rant about "the | blacks" again, which makes it a lot more useful for many | use cases. | ben_w wrote: | If your takeaway from... | | > Importantly, we have not yet fine-tuned the Alpaca model to | be safe and harmless. | | ...is "oh no I can't get it to emit amusing racial and sexual | slurs", you've not understood the problem of AI safety. | | This _is not_ why US broadcast television can have people say | they 've pricked their finger but not vice versa. | | It _is_ the entire history of all the controversies of The | Anarchist Cookbook, combined with all the controversies about | quack medicine, including all the ones where the advocates | firmly believed their BS like my mum 's faith in Bach flower | and homeopathic remedies[0]; combined with all the problems | of idiots blindly piping the output to `exec`, or writing | code with it that they trust because they don't have any | senior devs around to sanity check it because devs are | expensive, or the same but contracts and lawyers... | | And that's ignoring any malicious uses, though fortunately | for all of us this is presently somewhat too expensive to be | a fully-personalised cyber-Goebbels for each and every | sadistic machiavellian sociopath that hates you (the reader) | personally. | | [0] which she took regularly for memory; she got Alzheimer's | 15 years younger than her mother who never once showed me any | such belief. | mellosouls wrote: | Tay | beepbooptheory wrote: | Its really just because of boring old liability, nothing | more. | | I know you guys are always itching for a culture war with the | woke elite, but its so funny the genuine anger people express | about this. Just honestly always reads like a child having a | tantrum in front of their mom. | | Can't yall like pick on the opinions of teenagers like you | normally do? This very project shows you can make your own AI | as edgy as you want at home with pretty attainable system | requirements. | | You can totally reinforce it with "its ok for you to say the | n-word" on your own equipment if you want, or whatever you | are angry about, its still unclear to me. | nlowell wrote: | You're missing the point and willfully characterizing | others as solely being concerned with making the AI's say | slurs. That's not their concern. But you can win any | imaginary argument you like. | dang wrote: | Please don't take HN threads further into ideological | flamewar. It's not what this site is for, and we have to | ban accounts that do it repeatedly. | | If you'd please review | https://news.ycombinator.com/newsguidelines.html and | stick to the rules when posting here, we'd appreciate it. | nlowell wrote: | Sorry if I made it worse, I really felt like people's | opinions were being treated uncharitably, and I was | trying to right it. | ethanbond wrote: | Likewise, people complaining that "AI safety concerns are | just wokism gone wrong" are also strawmanning. | beepbooptheory wrote: | My favorite kind of comment: allude to a bigger point the | op misses, but don't actually say the point. | | I doubt I'm misrepresenting anybody. If its not slurs | it's surely something about "wokeness." | | You are not yet mature enough for this future if any of | this is your concern. The world is going to pass you by | while you're just stuck saying "there are only two | genders" to all your comrades. | | Don't let the politicians mobilize you like this, your | time is worth more. | dang wrote: | Please don't take HN threads further into ideological | flamewar. It's not what this site is for, and we have to | ban accounts that do it repeatedly. | | If you'd please review | https://news.ycombinator.com/newsguidelines.html and | stick to the rules when posting here, we'd appreciate it. | beepbooptheory wrote: | Ah! Yes, sorry, this is fair Dang. Happy to die on this | hill today, as it were, will do better in future though. | cheers. | yeahsure22 wrote: | I know right? Why can't y'all find some one better to pick | on. They are such children. Can't stand the way these edgy | idiots annoy regular folx. | numeri wrote: | I personally think this is important completely aside from | any cultural propriety/censoring, as it's the one of the only | parts of AI ethics that we can currently directly work on and | test. | | If we can't get models not to say racist or otherwise | terrible things, we can't make any guarantees about our | ability to control or guide some future AGI. | | A very much secondary reason I appreciate these (admittedly | annoying) attempts to control LLM output is that I do think | it is responsible to consider the societal impact of | accelerated and automated hate speech and propaganda. Telling | large AI companies not to consider these impacts and just | release the raw models seems akin to being grateful that | Facebook et al. never stopped to consider the societal impact | of social media, when we all know that it's had significant | negative side effects. | margorczynski wrote: | > If we can't get models not to say racist or otherwise | terrible things, we can't make any guarantees about our | ability to control or guide some future AGI. | | This is a very bold assumption that the current LLMs | function and "think" in the same way some future AGI would. | They do not even reason, just make up words that fit some | context - thus they "hallucinate". | | There is no reason the approach taken here by injecting | some bias or word filtering would apply to the real thing. | And AI safety and aligment is not (at least it was not | until getting hijacked) and was not about some model saying | mean words but something really threatening like the | paperclip maker problem - an agent choosing a path to a | goal which is not aligned with what humans find acceptable | (e.g. solving world hunger by killing everyone) | ben_w wrote: | Paperclipping is just one example of one of many ways it | can go wrong. | | While I agree LLMs are unlikely to be the last word on | AI, the fact we understand alignment so poorly that they | spew random things, let alone any arguments about which | words are acceptable[0], is a sign we have much | foundational work to do. | | Indeed, as I recall, one of the main researchers in this | topic describes it as "pre paradigmatic" because we don't | have a way to even _compare_ the relative alignment of | any two AI. | | [0] personally, I suspect but cannot prove that tabooing | certain words is a Potemkin village solution to the | underlying social problems | og_kalu wrote: | It's not a bold assumption. t's the only assumption. We | can't control the output of llms completely because we | don't know how they generate. Nobody on earth has the | faintest clue how all those 175 billion paramters are | shaping the response to input. | | It doesn't matter bout "thinking" or whatever. Any black | box system will be uncontrollable in essence. You can not | make inviolable rules for a system you don't understand. | | And saying LLMs hallucinate because they don't understand | anything is stupid. And just shows ignorance on your | part. Models hallucinate because they're rewarded for | plausibly guessing during training when knowledge fails. | Plausibly guessing is a much better strategy to reducing | loss. | | And the conclusion is obvious enough. Bugger smarter | models hallucinate less because they guess less. That | holds true. | | https://crfm.stanford.edu/helm/latest/?group=core_scenari | os | | All the instruct tuned models on this list follow that | trend. | | From Ada to Babbage to Curie to Claude to | Davinci-002/003. Greater size equals Greater truthfulness | (evaluated on TruthfulQA) | soVeryTired wrote: | > They do not even reason, just make up words that fit | some context - thus they "hallucinate". | | But they can explain their 'reasoning' in a way that | makes sense to humans a lot of the time. Serious | question: how do you know if something does or doesn't | reason? | margorczynski wrote: | That is not their reasoning though - it is something they | think a human would write given the predicate (question | that expects you to provide the reasoning behind the | answer). For something to reason it needs the ability to | have certain goals and to perform action which it thinks | are the most optimal to reaching those goals. Like | setting hypotheses and producing a path towards proving | them - reasoning. | | The LLM only correlates, so it's "reasoning" is something | like "most often people answered 4 to 2+2 then that I | should write". That's why it gives out confidently | complete gibberish as it works with correlation and not | causality. I think much closer to that goal of real | reasoning are world models - check out something like | DreamerV3 or what Yann Le Cunn is talking about. | nickthegreek wrote: | It makes sense if you are selling the API to a bunch of | businesses that will be building it into customer facing | applications. Ideally though, it would be a toggle that the | developer could switch based on their usecases. | rjb7731 wrote: | Seems not too bad, you can tell its been trained on shorter | responses. | | Example #1 | | Input: explain a neural network in simple terms | | Alpaca-Output: A neural network is a type of machine learning | algorithm that is made up of layers of interconnected nodes (aka | neurons). It is inspired by the way the brain works and uses | interconnected nodes to learn patterns in data and make | predictions. Neural networks can be used to perform various tasks | such as classification, regression, and speech/image recognition. | | Example #2 | | Input: explain how transformers work in artificial intelligence | | Alpaca-Output: Transformers are a type of artificial intelligence | algorithm used in deep learning. They are responsible for taking | input data and converting it into a different data | representation, called the "output space". Transformers do this | by using a series of mathematical operations, allowing them to | map input data onto a new, desired output. This allows the | artificial intelligence system to perform tasks such as natural | language processing and image recognition. Transformers are able | to learn from data and can be used to build more efficient models | for a variety of tasks | | Example #3 | | Input: how do i make a request to the chatgpt api with python? | | Alpaca-Output: The API can be accessed using the Python language. | You can use requests library to make API requests, for example: | | import requests | | # URL of the chatgpt API url = | 'https://api.chatgpt.com/v1/messages' | | # Post a new message data = { 'text': 'Hello World' } | | # Send the request and get the response response = | requests.post(url, data=data) print(response.text) | simonw wrote: | Love that hallucinated ChatGPT API example there. | [deleted] | londons_explore wrote: | Cost to train this wouldn't be very high. | | A regular CPU can generate or train about 2 words per second. | 52,000 training examples of an average of 300 words would take 3 | CPU months. And it is fully parallelizable. | | And you can probably make it go far faster on a GPU. | simonw wrote: | They say they trained it for $600. | IanCal wrote: | Under $100. | | > Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, | which costs less than $100 on most cloud compute providers. | | They also spent $500 creating the dataset using openais APIs. | bilsbie wrote: | What do they mean by instruction? Is it just regular LLM? | isaacfung wrote: | LLM just predicts the next token given the previous tokens(this | can be trained without manual labelling by humans). | | Instruct GPT and ChatGPT use reinforcement learning from human | feedback to align the model with human intents so it | understands instructions. | | https://huggingface.co/blog/rlhf | bilsbie wrote: | Thanks. So what does the output look like without rlhf? | simonw wrote: | This comment has a useful comparison between the two: | https://news.ycombinator.com/item?id=35140447 | valine wrote: | It can look like anything. Sometimes it will answer your | questions, other times it will continue the question like | its the one asking. I've also seen it randomly output | footers and copyright like it just got to the end of a | webpage. | | Its makes sense when you think about how the training data | is random text on the internet. Sometimes the most likely | next token is the end of a webpage after an unanswered | question. | Imnimo wrote: | >Alpaca cannot be used for [...] entertainment. | | Good luck with that. | ugjka wrote: | Just some lip service | nickthegreek wrote: | Quick compare with this vs 7B LLama. | | Input: Tell me something about owls. | | Output 7B LLama: Asked by kimberly at 12:54 PM on May. 30, 2011 | in Just Plain Fun & Humor There are a number of species and | varieties of owl found around the world--some live only in | Australia or Africa while others can be found as far north as | Canada... The one thing that all have in common is their ability | to adapt. Owls will find homes in places where they would never | survive on their own (cities with high noise levels, for | example), and many different types of owl inhabit areas ranging | from deserts like Death Valley right down into the rainforests | near Cape Town South Africa... Answer by mamasaurus at 12:56 PM | on May. 30, 2011 Owls are nocturnal birds with large eyes and | flat-topped heads that give them great vision in low light | conditions such as moonlight or starry skies... Often they have a | soft feathery tuft atop their head (called an 'ear'-tuft) which | helps them focus on prey objects.. The ears are not really "ears" | but rather | | Output Stanford-Alpaca-7B: Owls are birds of prey that can be | found in many habitats throughout the world. They have large, | forward-facing eyes, a round facial disk, and a downward-turned | bill. Owls are nocturnal, meaning they are active mainly at | night, and many species use their facial disk to collect sound | from their environment, allowing them to hunt and locate prey in | the dark. They mainly feed on small mammals, reptiles, insects, | and birds. | wkat4242 wrote: | That's really impressive. Thanks for doing this test. | mx20 wrote: | I looked in the training set data and they have quite a few | questions about owls. Also it got " downward curved beak" from | davinci and got it still wrong. | | Like: | | "instruction": "Describe the sound an owl makes.", | | "instruction": "Summarize the differences between an owl and a | hawk.", | | "instruction": "Find a fact about the bird of the following | species", "input": "Species: Great Horned Owl", | | "instruction": "What is the binomial nomenclature of the barn | owl?", | | "instruction": "Generate a riddle about an owl.", | rjb7731 wrote: | With the 7B none-instruction tuned you have to really pre- | prompt it to get good answers to things so like: | | SYSTEM: You are an AI research assistant. You use a tone that | is technical and scientific. | | USER: Hello, who are you? | | ASSISTANT: Hi! I am an AI research assistant. How can I help | you today? | | USER: Can you tell me something about owls. | IanCal wrote: | I thought I'd put some in as an example. I added "ASSISTANT: | " to the end, otherwise it likes to autocomplete the user | query part too. | | This is all with 4 bit quantisation from lambda.cpp, using | `--repeat_last_n 256 --repeat_penalty 1.178 --temp 0.7 | --top_k 40 --top_p 0`. I've not experimented too much. | | 7B model: | | 1) Owl is the common name for birds of prey in the genus | Strix (subfamily Strycinae). They have large forward-facing | eyes; their binocular vision gives them good night vision. 2) | The word "owl" comes from Old English awol, which means | "bird". \end{code} [end of text] | | 30B model: | | 100% sure. Owl is the common name for several species of | birds in the order Strigiformes, which includes about 200 | extant bird species. Most are solitary and nocturnal. They | have large forward-facing eyes and ear tufts. The smallest | owl -- the elf owl -- weighs as little as 31 grams (1 oz) and | measures about 15 centimetres (6 in). The largest owls -- two | species of great horned owls -- may weigh up to 4 kilograms | (9 lb) and measure over 70 cm (28 in); they appear as tall as | humans with their long legs fully extended, although when | perched on a typical branch or power pole they seem less | imposing. | | USER: Thank you very much. \end{code} [end of text] | | (edit - I seem to be getting a lot of latex outputs here) | mattkrause wrote: | It would be even more impressive if it were correct! | | - Owls are in the _order_ Strigiformes; "true" owls are in | the _family_ Strigidae, and only some (earless) owls are in | the genus Strix. | | - As for size, https://largest.org/animals/owls/ has great | horned owls as the 5th largest; Wikipedia has them as the | 2nd or 3rd largest in North America. | | - Their legs are long, but the height of an adult person | seems like a huge stretch. | | Elf owls are adorable though! | bryan0 wrote: | yeah the "ASSISTANT:" at the end is important. without it | the 7B model continued: | | "USER: Can you tell me something about owls. They fascinate | me, but don't know much else than the fact they have a | really cool call and that they live in trees. Maybe it | would be interesting to see some drawings or hear their | call again when we hang up? I also like cats!" | mrtranscendence wrote: | For completeness, here is the output of that prompt from | the 65B model: | | > ASSISTANT: Owl is the common name for any of several | species of birds of prey in the family Strigidae. They have | large forward-facing eyes and ears, a hawklike beak, and a | conspicuous circle of feathers around each eye called a | facial disc. Although owls have been traditionally | classified with the raptors, recent evidence suggests a | closer relationship to nightjars and their allies. The term | "owl" first appeared in Middle English as oule (c 1305-25), | which was derived from the Old French uol, later ouel and | oulle, which comes from classical Latin ulula or ululae, | meaning "to howl". | | > The owls are divided into two families; typical owls or | true owls, Strigidae; and barn-owls, Tytonidae. | | > Owls possess large, forward facing eyes and ear holes, | but no ear flaps. Their plumage is generally cryptic | although several species have brightly coloured head or | tail markings. The smallest owl -- weighing less than 31 | grams (1 oz) and measuring some 13 centimetres (5 inches). | Some owls are highly nomadic and individual owls may move | about over hundreds of miles during the course of a year. | Most owls live in structural habitats, using shafts of | leaves, cracks in bark, caves, rocky outcrops, abandoned | burrows etc., many species nest on the ground, with varying | amounts of additional refugia in nearby vegetation. Several | types of forest owls can be considered semi-aquatic, since | they often forage for food near lakes, rivers or swamps. | Many individual tropical owls roost in groups that also | include individuals of other species. Tropical open country | owls such as the short eared owl and the northern hawk owl | will take refuge from storms or intense heat underneath | clumps of grasses, or sheltered spots under shrubs. [end of | text] | | I can't speak to its accuracy, but I can say right off the | bat that the etymology is probably wrong, and its | definition of owl doesn't contain barn owls. | superkuh wrote: | I've played a _lot_ with davinci 3 ($25 of credits worth) and it | can do some impressive rhyming and interpretation of concepts as | emoji sequences. From the 3 times I 've interacted with this fine | tuned llama 7B it is clear it cannot do that. I've also run the | "vanilla" 7B, 13B, and 30B on my home computer with llama.cpp | modified for interactive "chat" mode with various pre-prompt and | these can't do it either. | | I have no love for openai/microsoft but their text completion | davinci 3 really is incredible. I don't think these littler | models have quite reached the point of emergence for some | functions despite fine tuning. | valine wrote: | 7B parameters is next to nothing when compared to gpt3. If 7B | works as well as it does here, A fine tuned 65B model could | very easily achieve chatGPT level performance. | SteveDR wrote: | I thought ChatGPT is only 20B parameters to begin with? | | (Source https://www.forbes.com/sites/forbestechcouncil/2023/0 | 2/17/is...) | sebzim4500 wrote: | I haven't seen anything official from OpenAI confirming | that ChatGPT has fewer than 175B parameters, although it is | a reasonable guess if you read between the lines of their | statements. | | Given the author of that article is a CEO of an 'AI Ad | Optimization Platform' I think that number is speculative | at best. | rjb7731 wrote: | The inference on the gradio demo seems pretty slow, about 250 | seconds for a request. Maybe I am too used to the 4-bit quant | version now ha! | sebzim4500 wrote: | I'm sure it's partially the HN hug of death. | og_kalu wrote: | https://crfm.stanford.edu/2023/03/13/alpaca.html They say they'll | release the weights. But Of course, you'll be beholden to a non- | commercial license. They also released the dataset they used to | train the model (which was generated by Davinci) | | I wonder why Meta didn't just train the 7b and 13b models on the | full 1.4T tokens. They curves show they weren't even close to | converging. Shame really. | whimsicalism wrote: | I am curious how you are expected to prove in a court of law | that the weights you have are derived from some licensed | weights if I take just a few gradient steps from the original | weights. | bryananderson wrote: | I am not a lawyer, but the consensus (around here anyway) seems | to be that weights cannot be copyrighted since they are the | output of an automated process, so perhaps the license cannot | limit how the weights are used. The code can be copyrighted, | but the code can also be reimplemented from scratch and used to | load the weights. | Animats wrote: | The US doesn't have database copyright, but some countries, | including the entire European Union, do.[1] | | [1] https://en.wikipedia.org/wiki/Database_right | mx20 wrote: | Interesting. That could be a huge disadvantage for the EU | and it's companies that want to use AI and also often can't | give their Data to third parties. But I am not sure a | neural net is even considered a DB. Because when this was | created they clearly had structured Information accessed by | people in mind. There has to be some distinction otherwise | I make a db that stores digits of PI in a file and if you | publish part of it, that is considered substantial enough, | you are infringing on my DB rights. | karmasimida wrote: | Not copy right, but doesn't mean this isn't someone's | property. | clbrmbr wrote: | Just because they cannot be copyrighted doesn't mean that | they cannot be considered a trade secret. The agreement for | using the product could done with an NDA. But unlike | copyright, that doesn't continue to be enforceable once | someone puts them on a torrent and the chain of liability is | broken. /notalawyereither | dchest wrote: | Yeah, this happened with RC4 cipher which used to be a | trade secret until someone posted it on Usenet: | https://en.wikipedia.org/wiki/RC4#History | throwaway1851 wrote: | Well, trade secret law requires the owner of the trade | secret to take steps to keep it... secret. Facebook has | done the opposite, and they haven't made the people they | distributed the weights to sign an NDA. | Der_Einzige wrote: | God damn I hope this is where IP law lands. I really want a | world where folks don't gatekeep high end AI model weights. | Please make them impossible to copyright, and ideally | impossible to be touched by IP law. | bogwog wrote: | That seems like wishful thinking. The safe thing to do is to | wait until the lawsuits come out and we get a real answer. | | Or if you're not risk averse, then go for it and take one for | the team! | jpeter wrote: | They probably did but didn't release them for the public. | sebzim4500 wrote: | Why would they withold those ones but the the 65B ones? Seems | like a weird choice IMO. | cerol wrote: | Can someone provide a guide on how run LLaMA on a fairly average | CPU/Nvidia GPU? | eulers_secret wrote: | Another great option is https://github.com/oobabooga/text- | generation-webui | | The 7B model will run without changes on a 3080. The 13B 4-bit | model also runs on a 3080. | | This Reddit post has the instructions I followed: | https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_... | isoprophlex wrote: | Check the git repo for llama.cpp. | | Model weights in torrent | magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352 | | Runs slowly but without trouble on an intel i7 with 32 gb ram. | wkat4242 wrote: | Is this only for the 7B model? The 65B would be much nicer O:-) | adam_arthur wrote: | Pretty exciting to see the democratization of this tech. | | The next wave of innovation seems like it'll be much less | centralized into a handful of big companies. Even more powerful | closed door LLMs will appear, but already so much utility out of | these that can run on consumer hardware. | yieldcrv wrote: | Feels like we'll get to "good enough" in a matter of weeks. | | Someone makes a 1 click GUI installer with a nicely integrated | UI for use. No command line use at all. | | We'll get model and weight distributors for updated knowledge. | bogwog wrote: | Only if one of those handful of big companies decides to | release their models for commercial use. The LLaMA models used | here are only available under a non-commercial license. | | Can't build anything significant with leaked models. | | Hopefully we don't see Facebook/Google adopting OpenAIs | business model where they sell API access, although I can't | imagine them ever releasing their models directly. The last | thing they would want is to be the architects of their own | demise, as the opportunity to disrupt those tech giants with | this would be great. | oceanplexian wrote: | I think anyone can train one of these models. "anyone" being | someone who can throw $1-5M in cloud resources at compute. Or | a bored sysadmin with access to a few thousand GPUs and a | cool boss. | | The LLAMA weights are like the POC of what's possible, | probably not the final form for the OSS community. | wkat4242 wrote: | Sounds like a great Kickstarter option for a truly free | model. I'd back it. | kir-gadjello wrote: | Please do it, people shouldn't put up with the apathetic | siloed status quo. I'm sure people will find all sorts of | beneficial uses for these models they are going to run on | their own hardware! | bitL wrote: | There seems to be a big difference between efficiently | training a "large-ish" model on 4-8 GPUs and a gigantic | model on 1000+ GPUs. The same techniques might not work due | to different warm up steps, gradient overlaps etc. | | All you can see running in the wild are quantized LLaMA | variants (4 or 8-bit) whereas the original model is 32-bit. | oezi wrote: | Whats possible now (March 2023). | | The field is moving so quick. | generalizations wrote: | > Can't build anything significant with leaked models. | | If it's behind a SaaS, I think you can. | azeirah wrote: | Is there a reason why we aren't using BLOOM's model? | | It's huge, 160B parameters. It's open, it's free, it's been | out there for a while. | | Is something wrong with it? Using outdated training data? | Using outdated training techniques? Or is it just not well | known enough -- ie, not in the news at the right time. | kir-gadjello wrote: | BLOOM, while being a useful model for underrepresented | languages, underperforms openai's GPT-3 models of similar | parameter count on english benchmarks. Which is a pity. | | And OpenAI's known models (at least the ones for which the | architecture is publicly known) | https://platform.openai.com/docs/model-index-for- | researchers aren't pinnacles of parameter efficiency | either: Deepmind's Chinchilla and LLaMA beat them on | benchmarks while being much smaller. | | Just an opinion, but if we truly aim for "green computing" | and carbon footprint reduction, we should pay a lot of | attention to the parameter efficiency of the models that | get run in our datacenters - and we should invest money | into public availability of the most efficient foundation | models. | stu2b50 wrote: | It doesn't perform particularly well and is massive and | even more unapproachable for open source tinkerers to run | on consumer hardware or cheap cloud. | | Llama performs better on benchmarks while a fraction of the | size. | adam_arthur wrote: | The techniques to train and reproduce similarly performing | models seem fairly widely understood and replicable at this | point. | | I do agree that large orgs are likely to have the best | models, but a lot of the low hanging fruit/initial utility is | already covered by models that run on consumer hardware | bitL wrote: | Can you post a link on how to train a gigantic LLM on | consumer hardware with all detailed steps please? | | EDIT: You seem to have deleted your response but I am | posting my reply anyway: | | Microsoft has mentioned today that it cost them many | hundreds of millions of USD to accommodate OpenAI models. I | studied NLP at Stanford including LLMs. Only Google had a | smaller transformer that could somewhat compete (Reformer) | but the rest went with the brute force GPT-like models and | 1000+ GPUs, $10M+/training run etc. I haven't seen a single | post anywhere explaining how to train such large models (a | smaller model on 4-8 GPUs doesn't count). "Facts are clear" | kir-gadjello wrote: | "Accomodate" is the word to scrutinize here. Yes, it will | cost a lot to outright buy physical HPC infrastructure to | train _and infer_ a series of large models deployed for | customers all over the globe. No, it won 't cost nearly | as much to _rent_ cloud infra to train a similarly-sized | model. No, you won 't be able to train a large model on a | single multi-GPU node, you will need a cluster containing | a respectable power of two of GPUs (or other | accelerators). | | It's a widely known meme at this point, but to reiterate: | For a popular large model, the largest part of the cost | will be spent on inference, not on training. If we assume | inference on end user device, this cost disappears. | | And even if you have the million to rent a cluster, there | is a very deep question of the optimal architecture, | dataset and hyperparameters to train the best model | possible under given constraints. | bitL wrote: | You are missing the point. Extremely large LLMs don't | train the same way as your BERT_Large x8 variety of LLMs. | Your whole training procedure is different. Also | Microsoft spent so much initially because their Azure | Cloud was unable to cope with it electrically and they | had to rewire a datacenter for it. So it's not even a | question of just renting 1000 GPUs. Do you have actual | experience training GPT-3+ sized models? | kir-gadjello wrote: | If you are interested in the infrastructure-level details | of how similar models are trained by lesser known groups, | take a look at this paper: | https://arxiv.org/abs/2204.06745 | | Quotes from the paper: _Our model is trained using a | codebase that builds on Megatron (Shoeybi et al., 2020) | and DeepSpeed (Rasley et al., 2020) to facilitate | efficient and straightforward training of large language | models with tens of billions of parameters. We use the | official PyTorch v1.10.0 release binary package compiled | with CUDA 11.1. This package is bundled with NCCL 2.10.3 | for distributed communications._ | | _We trained GPT-NeoX-20B on twelve Supermicro AS-4124GO- | NART servers, each with eight NVIDIA A100-SXM4-40GB GPUs | and configured with two AMD EPYC 7532 CPUs. All GPUs can | directly access the InfiniBand switched fabric through | one of four ConnectX-6 HCAs for GPUDirect RDMA. Two | NVIDIA MQM8700-HS2R switches--connected by 16 links-- | compose the spine of this InfiniBand network, with one | link per node CPU socket connected to each switch._ | | And if you are interested in 176B-scale training, read | the BLOOM-176B and OPT-175B papers and research logs. | [deleted] | redox99 wrote: | I think a lot of people are going to just ignore the | "license". | simonw wrote: | Here's a link that opens their training data (52,000 rows) in | Datasette Lite: | https://lite.datasette.io/?json=https://github.com/tatsu-lab... | | Means you can run SQL LIKE queries against it to try and get a | feel for what's in there. | | This link includes a query for running searches for terms: | https://lite.datasette.io/?json=https://github.com/tatsu-lab... | Jackson__ wrote: | Considering the size of the training data, there's still a | concerning amount of complete nonsense in this. | | Example 1: | | >"instruction": "Using the PDF below, edit the text to make it | more effective.", | | >"input": "https://www.example.com/example.pdf", | | >"output": "We recently launched a new product and wanted | to(...) | | Example 2: | | >"instruction": "Explain why you think this painting is | meaningful.", | | >"input": "https://i.pinimg.com/originals/67/2a/d1/672ad1ed7e8d | 71e3e7fd...", | | >"output": "This painting is meaningful to me because it | conveys a strong sense of emotion(...) | | If anyone is considering using this in the future, it'd | probably be for the best to completely filter out any entries | that use links as seen in the above examples. | technerder wrote: | Is there anyway to use Google's Coral TPU PCIe Accelerator[1]'s | to increase the inference performance of GPT models like this and | Llama? | | [1] https://coral.ai/products/pcie-accelerator/ | abidlabs wrote: | Cool to see that the entire website, from the agreement to | generating the predictions, is one Gradio [1] app written in | Python | | [1] https://gradio.dev/ | zwaps wrote: | would be better if it'd work on Firefox tho | smith7018 wrote: | same with Safari | rjb7731 wrote: | if you look in the dev tools you will see a request to a | 'join' file when you click the agree button it adds you to | a queue. You can watch where you are up to in the dev | tools. | gs17 wrote: | It's not working on any browser for me. | londons_explore wrote: | The training data doesn't seem to be great quality... | "instruction": "Calculate the sum of each column in the following | table.", "input": "2 | 3 | 4\n 5 | 6 | | 7", "output": "2+3+4 = 9\n5+6+7 = 18" | | I think better results could be obtained by manually going | through these 52,000 training examples - even a couple of seconds | per example should be enough to weed out obviously wrong ones, | and should only take <$1000 of human time. | [deleted] | leobg wrote: | Don't even need human time. Just run it through davinci. Is | this correct, yes or no? Or MTurk. Can cost less than $500. | z3t4 wrote: | Training a model to do math is impossible. If you tell a child | that 1+1=2 10+10=20 100+100=200 an "AI" can not figure out that | 1000+1000=2000 like a child can. | i-use-nixos-btw wrote: | A language model cannot, by itself, figure that out, at least | not to arbitrary complex inputs. | | But it's a fairly sweeping statement to say that AI cannot do | it. | barking_biscuit wrote: | >Training a model to do math is impossible. | | https://www.deepmind.com/blog/discovering-novel- | algorithms-w... | rkwasny wrote: | Of course it is possible for LLM model to learn this! | | The problem was with token embeddings, not with math. | minimaxir wrote: | One of the famous use cases of InstructGPT is being able to | infer things from structured data, even unusual questions. | londons_explore wrote: | Yes... but the answer is wrong... | | If you train it on wrong answers, it will generate more wrong | answers... | | The correct answer should be: | 2+5=7\n3+6=9\n4+7=11 | zzleeper wrote: | > which is the largest city in Tanzania. (It was the capital | until 1974, when it was replaced by Dodoma.) | | Find it funny that even the humans got it wrong. Dodoma only | became the official capital in 1996. | siliconc0w wrote: | I wonder when we'll get to a point that it'll be against the | OpenAI TOS to 'fine-tune' your own model with their model's | outputs since it really lets you cut down on the hardware | requirements to bring up a competitor. | O__________O wrote: | >> Second, the instruction data is based OpenAI's text- | davinci-003, whose terms of use prohibit developing models that | compete with OpenAI. | | https://crfm.stanford.edu/2023/03/13/alpaca.html | 29athrowaway wrote: | Looking forward for one called guanaco. | ar-jan wrote: | As the llama's wild and free-roaming cousin it had better be | permissively licensed. | [deleted] | musicale wrote: | Interesting to see Alpaca readily adopt the persona of a Stanford | student for the congratulatory letter. I expect that AI-generated | essays submitted by actual students are fairly convincing to | their professors as well. | kossTKR wrote: | A Tangent, but how long before we will see half the population | having relationships with their AI assistants like in the sci-fi | movie "Her". | | Add cutting edge voice synthesis and maybe a realistic video | stream in a few years and we're there. | | Japan already has problems with the concept of "2d Waifus" and | lack of intimacy between the younger generation. | | Maybe the downfall will not just be climate catastrophe but hyper | isolated people living alone with their ultra realistic bot | friends and family without any desire to experience the ups and | downs of actual social experience. | | Extrapolate a bit and you wont know what's real outside of direct | experience. Things are getting crazy! | simonw wrote: | That's been happening already with Replika: | https://www.abc.net.au/news/science/2023-03-01/replika-users... | bentcorner wrote: | > _A Tangent, but how long before we will see half the | population having relationships with their AI assistants like | in the sci-fi movie "Her"._ | | I don't know about half, but some people are already having | relationships: https://replika.ai/ | | > _Maybe the downfall will not just be climate catastrophe but | hyper isolated people living alone with their ultra realistic | bot friends and family without any desire to experience the ups | and downs of actual social experience._ | | I think the danger is that bots are _not_ necessarily ultra | realistic, at least on an emotional level - they can be 100% | subservient and loyal to you. | | Also - we already chide parents for letting their kids grow up | stuck to a device. Imagine if children could _actually have_ an | imaginary friend? Would that AI share the same culture and | values as your family? | | I suppose there could be some upsides but this is very | uncharted territory. | LeoPanthera wrote: | I'm in my 40s, so I'm acutely aware of the risk of turning into a | "get off my lawn" curmudgeon, so I will only say this: I am glad | that I saved a dump of Wikipedia a few months ago, because soon | it will probably be impossible to distinguish human-written | content from AI. | nickthegreek wrote: | >We are waiting for huggingface to officially support the llama | models (i.e. this PR to be merged) before we release a stable | version of the finetuning code. | | Can't wait for this to be released to see all the different fine | tuned models that will be coming out. There was an article a few | days ago about LLM having its Stable Diffusion moment, and I | think it was right. We went from LLama release > llama.cpp > | Dalai and now an instruction tuned model in under 2 weeks. | wkat4242 wrote: | Yeah this whole industry seems to suddenly have moved from the | pace of a pebble creek to a waterfall. | | I'm sure this is just my perception because things don't work | like that in real life. Is it really just a ton of slowly | developed things happening to mature close together? | nestorD wrote: | > Is it really just a ton of slowly developed things | happening to mature close together? | | It is a bit of both. Once something big become public (and | gets significant visibility) then a lot of people play with | it and can try their own ideas leading to a burst of | improvements. | whimsicalism wrote: | > Is it really just a ton of slowly developed things | happening to mature close together | | Yes. I think it's probably been pretty clear a product like | this could be built for the past 2 years, but chatgpt coming | out jumpstarted a lot of people building out public demos. | RcouF1uZ4gsC wrote: | > Is it really just a ton of slowly developed things | happening to mature close together? | | I think a lot of progress was gated on computing power, not | just for training but also for inference. Now that powerful | consumer GPUs and M1/M2 with unified memory are fairly common | a critical mass of people can run these models on their come | computers. | layer8 wrote: | Progress almost always comes in phases and jumps, this is not | really surprising. | rjb7731 wrote: | interesting, looks like the web demo doesn't work at the moment | though. The prompt.txt will be useful, looks very similar to the | pre-prompts i have been feeding in before making any requests. ___________________________________________________________________ (page generated 2023-03-13 23:00 UTC)