[HN Gopher] Show HN: LlamaGPT - Self-hosted, offline, private AI... ___________________________________________________________________ Show HN: LlamaGPT - Self-hosted, offline, private AI chatbot, powered by Llama 2 Author : mayankchhabra Score : 111 points Date : 2023-08-16 15:05 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | belval wrote: | Nice project! I could not find the information in the README.md, | can I run this with a GPU? If so what do I need to change? Seems | like it's hardcoded to 0 in the run script: | https://github.com/getumbrel/llama-gpt/blob/master/api/run.s... | crudgen wrote: | Had the same thought, since it is kinda slow (only have 4 | pyhsical/8 logical cores though). But I think vRAM might be a | problem (8gb can work, if one has a rather recent gpu (here | m1/2 might be interesting)). | mayankchhabra wrote: | Ah yes, running on GPU isn't supported at the moment. But CUDA | (for Nvidia GPUs) and Metal support is on the roadmap! | samspenc wrote: | Ah fascinating, just curious, what's the technical blocker? I | thought most of the Llama models were optimized to run on | GPUs? | caesil wrote: | So many projects still using GPT in their name. | | Is the thinking here that OpenAI is not going to defend that | trademark? Or just kicking the can down the road on rebranding | until the C&D letter arrives? | schappim wrote: | They don't have the trademark yet. | | OpenAI has applied to the United States Patent and Trademark | Office (USPTO) to seek domestic trademark registration for the | term "GPT" in the field of AI.[64] OpenAI sought to expedite | handling of its application, but the USPTO declined that | request in April 2023. | khaledh wrote: | This reminds me of the first generation of computers in the 40s | and early 50s following the ENIAC: EDSAC, EDVAC, BINAC, UNIVAC, | SEAC, CSIRAC, etc. It took several years for the industry to | drop this naming scheme. | super256 wrote: | Well, GPT is simply an initialism for "Generative Pre-trained | Transformer". | | In Germany, a trademark can be lost if it becomes a | "Gattungsbegriff" (generic term). This happens when a trademark | becomes so well-known and widely used that it becomes the | common term for a product or service, rather than being | associated with a specific company or brand. | | For example, if a company invented a new type of vacuum cleaner | and trademarked the name, but then people started using that | name to refer to all vacuum cleaners, not just those made by | the company, the trademark could be at risk of becoming a | generic term; which would lead to a deletion of the trademark. | I think this is basically what happens to GPT here. | | Btw, there are some interesting exampls from the past were | trademarks were lost due to the brand name becoming too | popular: Vaseline and Fon (hairdryer; everyone in Germany uses | the term "Fon"). | | I also found some trademarks which are at risk of being lost: | "Lego", "Tupperware", "Post" (Deutsche Post/DHL), and "Jeep". | | I don't know how all this stuff works in America though. But it | would honestly suck if you'd approve such a generic term as a | trademark :/ | raffraffraff wrote: | Actually in the UK and Ireland a vacuum cleaner is called a | Hoover. But in general I think we do that less than | Americans. For example, we don't call a public announcement | system a "Tannoy". That's a brand of hifi speakers. And we'd | say "photo copier" instead of Xerox. | lee101 wrote: | [dead] | SubiculumCode wrote: | I didn't see any info on how this is different than | installing/running llamacpp or koboldcpp. New offerings are | awesome of course, but what is it adding? | mayankchhabra wrote: | The main difference is setting everything up yourself manually, | downloading the modal, optimizing the parameters for best | performance, running an API server and a UI front-end - which | is out of reach for most non-technical people. With LlamaGPT, | it's just one command: `docker compose up -d` or one click | install for umbrelOS home server users. | SubiculumCode wrote: | thanks. yeah, that IS useful. | | Anyone see if it contains utilities to import models from | huggingface/github? | DrPhish wrote: | Maybe I've been at this for too long and can't see the | pitfalls of a normal user, but how is that easier than using | an oobabooga one-click installer (an option that's been | around "forever")? | | I guess ooba one-click doesn't come with a model included, | but is that really enough of a hurdle to stop someone from | getting it going? | | Maybe I'm not seeing the value proposition of this. Glad to | be enlightened! | albert_e wrote: | Oh I thought this was a quick guide to host it on any server (AWS | / other clouds) of our choosing. | mayankchhabra wrote: | Yes! It can run on any home server or cloud server. | ryanSrich wrote: | Interesting. I might try to get this to work on my NAS. | reneberlin wrote: | Good luck! The token/sec will be under your expectations or | it will overheat. You really shouldn't play games with your | data-storage. You could try it with an old laptop to see | how bad it performs. Ruining your NAS for this is a bit | over the top to show, that "it worked somehow". But i don't | know, maybe your NAS has a powerful processor and is tuned | to the max and you have redundancy and don't care to loose | a NAS? Or this was just a joke and i fell for it! ;) | mayankchhabra wrote: | Not sure how powerful their NAS is, but on Umbrel Home | (which has an N5105 CPU), it's pretty useable with ~3 | tokens generated per second. | samspenc wrote: | I had the same question initially, was a bit confused by the | Umbrel reference at the top, but there's a section right below | it titled "Install LlamaGPT anywhere else" which I think should | work on any machine. | | As an aside, UmbrelOS actually seems like a cool concept by | itself btw, good to see these "self hosted cloud" projects | coming together in a unified UI, I may investigate this more at | some point. | QuinnyPig wrote: | I've been looking for something like this for a while. Nice! | Atlas-Marbles wrote: | Very cool, this looks like a combination of chatbot-ui and llama- | cpp-python? A similar project I've been using is | https://github.com/serge-chat/serge. Nous-Hermes-Llama2-13b is my | daily driver and scores high on coding evaluations | (https://huggingface.co/spaces/mike-ravkine/can-ai-code- | resul...). | lazzlazzlazz wrote: | (1) What are the best more creative/less lobotomized versions of | Llama 2? (2) What's the best way to get one of those running in a | similarly easy way? | mritchie712 wrote: | try llama2-uncensored | | https://github.com/jmorganca/ollama | lkbm wrote: | https://github.com/jmorganca/ollama was extremely simple to get | running on my M1 and has a couple uncensored models you can | just download and use. | brucemacd wrote: | https://github.com/jmorganca/ollama/tree/main/examples/priva. | .. there's an example using PrivateGPT too | dealuromanet wrote: | Is it private and offline via ollama? Are all ollama models | private and offline? | [deleted] | avereveard wrote: | I like this for turn by turn conversations: | https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b | | this for zero shot instructions: https://huggingface.co/Open- | Orca/OpenOrcaxOpenChat-Preview2-... | | easiest way would be https://github.com/oobabooga/text- | generation-webui | | a little more complex way I do is I have a stack with llama.cpp | server, a openai adapter, and bettergpt as frontend using the | openai adapter as the custom endpoint. bettergpt ux beats | oogaboga by a long way (and chatgpt on certain aspects) | synaesthesisx wrote: | How this compare to just running llama.cpp locally? | mayankchhabra wrote: | It's an entire app (with a chatbot UI) that takes away the | technical legwork to run the model locally. It's a simple one | line `docker compose up -d` on any machine, or one click | install on umbrelOS home servers. | chasd00 wrote: | is it a free model or is the politically-correct-only response | constraints in place? | mayankchhabra wrote: | It's powered by Nous Hermes Llama2 7b. From their docs: "This | model stands out for its long responses, lower hallucination | rate, and absence of OpenAI censorship mechanisms. [...] The | model was trained almost entirely on synthetic GPT-4 outputs. | Curating high quality GPT-4 datasets enables incredibly high | quality in knowledge, task completion, and style." | benreesman wrote: | I'm a little out of date (busy few weeks), didn't the Vicuna | folks un-housebreak the LLaMA 2 language model (which is world | class) with a slightly less father-knows-best Instruct tune? | Havoc wrote: | Llama is definitely "censored" though I've not found this to be | an issue in practice. Guess it depends on what you want to do | with it | redox99 wrote: | llama-chat is censored, not base llama | ccozan wrote: | Ok, since is running all private, how can I add my own private | data? For example I have a 20+ years of an email archive that I'd | like to be ingested. | rdedev wrote: | A simple way would be to do some form of retrieval on those | emails and add those back to the original prompt | cromka wrote: | I imagine this means you'd need to come up with own model, even | if based on existing one. | ravishi wrote: | And is that hard? Sorry if this is a newbie question, I'm | really out of the loop on this tech. What would be required? | Computing power and tagging? Or can you like improve the | model without much human intervention? Can it be done | incrementally with usage and user feedback? Would a single | user even be able to generate enough feedback for this? | phillipcarter wrote: | Yes, this would be quite hard. Fine-tuning an LLM is no | simple task. The tools and guidance around it are very new, | and arguably not meant for non-ML Engineers. ___________________________________________________________________ (page generated 2023-08-16 23:00 UTC)