[HN Gopher] Show HN: LlamaGPT - Self-hosted, offline, private AI...
       ___________________________________________________________________
        
       Show HN: LlamaGPT - Self-hosted, offline, private AI chatbot,
       powered by Llama 2
        
       Author : mayankchhabra
       Score  : 111 points
       Date   : 2023-08-16 15:05 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | belval wrote:
       | Nice project! I could not find the information in the README.md,
       | can I run this with a GPU? If so what do I need to change? Seems
       | like it's hardcoded to 0 in the run script:
       | https://github.com/getumbrel/llama-gpt/blob/master/api/run.s...
        
         | crudgen wrote:
         | Had the same thought, since it is kinda slow (only have 4
         | pyhsical/8 logical cores though). But I think vRAM might be a
         | problem (8gb can work, if one has a rather recent gpu (here
         | m1/2 might be interesting)).
        
         | mayankchhabra wrote:
         | Ah yes, running on GPU isn't supported at the moment. But CUDA
         | (for Nvidia GPUs) and Metal support is on the roadmap!
        
           | samspenc wrote:
           | Ah fascinating, just curious, what's the technical blocker? I
           | thought most of the Llama models were optimized to run on
           | GPUs?
        
       | caesil wrote:
       | So many projects still using GPT in their name.
       | 
       | Is the thinking here that OpenAI is not going to defend that
       | trademark? Or just kicking the can down the road on rebranding
       | until the C&D letter arrives?
        
         | schappim wrote:
         | They don't have the trademark yet.
         | 
         | OpenAI has applied to the United States Patent and Trademark
         | Office (USPTO) to seek domestic trademark registration for the
         | term "GPT" in the field of AI.[64] OpenAI sought to expedite
         | handling of its application, but the USPTO declined that
         | request in April 2023.
        
         | khaledh wrote:
         | This reminds me of the first generation of computers in the 40s
         | and early 50s following the ENIAC: EDSAC, EDVAC, BINAC, UNIVAC,
         | SEAC, CSIRAC, etc. It took several years for the industry to
         | drop this naming scheme.
        
         | super256 wrote:
         | Well, GPT is simply an initialism for "Generative Pre-trained
         | Transformer".
         | 
         | In Germany, a trademark can be lost if it becomes a
         | "Gattungsbegriff" (generic term). This happens when a trademark
         | becomes so well-known and widely used that it becomes the
         | common term for a product or service, rather than being
         | associated with a specific company or brand.
         | 
         | For example, if a company invented a new type of vacuum cleaner
         | and trademarked the name, but then people started using that
         | name to refer to all vacuum cleaners, not just those made by
         | the company, the trademark could be at risk of becoming a
         | generic term; which would lead to a deletion of the trademark.
         | I think this is basically what happens to GPT here.
         | 
         | Btw, there are some interesting exampls from the past were
         | trademarks were lost due to the brand name becoming too
         | popular: Vaseline and Fon (hairdryer; everyone in Germany uses
         | the term "Fon").
         | 
         | I also found some trademarks which are at risk of being lost:
         | "Lego", "Tupperware", "Post" (Deutsche Post/DHL), and "Jeep".
         | 
         | I don't know how all this stuff works in America though. But it
         | would honestly suck if you'd approve such a generic term as a
         | trademark :/
        
           | raffraffraff wrote:
           | Actually in the UK and Ireland a vacuum cleaner is called a
           | Hoover. But in general I think we do that less than
           | Americans. For example, we don't call a public announcement
           | system a "Tannoy". That's a brand of hifi speakers. And we'd
           | say "photo copier" instead of Xerox.
        
       | lee101 wrote:
       | [dead]
        
       | SubiculumCode wrote:
       | I didn't see any info on how this is different than
       | installing/running llamacpp or koboldcpp. New offerings are
       | awesome of course, but what is it adding?
        
         | mayankchhabra wrote:
         | The main difference is setting everything up yourself manually,
         | downloading the modal, optimizing the parameters for best
         | performance, running an API server and a UI front-end - which
         | is out of reach for most non-technical people. With LlamaGPT,
         | it's just one command: `docker compose up -d` or one click
         | install for umbrelOS home server users.
        
           | SubiculumCode wrote:
           | thanks. yeah, that IS useful.
           | 
           | Anyone see if it contains utilities to import models from
           | huggingface/github?
        
           | DrPhish wrote:
           | Maybe I've been at this for too long and can't see the
           | pitfalls of a normal user, but how is that easier than using
           | an oobabooga one-click installer (an option that's been
           | around "forever")?
           | 
           | I guess ooba one-click doesn't come with a model included,
           | but is that really enough of a hurdle to stop someone from
           | getting it going?
           | 
           | Maybe I'm not seeing the value proposition of this. Glad to
           | be enlightened!
        
       | albert_e wrote:
       | Oh I thought this was a quick guide to host it on any server (AWS
       | / other clouds) of our choosing.
        
         | mayankchhabra wrote:
         | Yes! It can run on any home server or cloud server.
        
           | ryanSrich wrote:
           | Interesting. I might try to get this to work on my NAS.
        
             | reneberlin wrote:
             | Good luck! The token/sec will be under your expectations or
             | it will overheat. You really shouldn't play games with your
             | data-storage. You could try it with an old laptop to see
             | how bad it performs. Ruining your NAS for this is a bit
             | over the top to show, that "it worked somehow". But i don't
             | know, maybe your NAS has a powerful processor and is tuned
             | to the max and you have redundancy and don't care to loose
             | a NAS? Or this was just a joke and i fell for it! ;)
        
               | mayankchhabra wrote:
               | Not sure how powerful their NAS is, but on Umbrel Home
               | (which has an N5105 CPU), it's pretty useable with ~3
               | tokens generated per second.
        
         | samspenc wrote:
         | I had the same question initially, was a bit confused by the
         | Umbrel reference at the top, but there's a section right below
         | it titled "Install LlamaGPT anywhere else" which I think should
         | work on any machine.
         | 
         | As an aside, UmbrelOS actually seems like a cool concept by
         | itself btw, good to see these "self hosted cloud" projects
         | coming together in a unified UI, I may investigate this more at
         | some point.
        
       | QuinnyPig wrote:
       | I've been looking for something like this for a while. Nice!
        
       | Atlas-Marbles wrote:
       | Very cool, this looks like a combination of chatbot-ui and llama-
       | cpp-python? A similar project I've been using is
       | https://github.com/serge-chat/serge. Nous-Hermes-Llama2-13b is my
       | daily driver and scores high on coding evaluations
       | (https://huggingface.co/spaces/mike-ravkine/can-ai-code-
       | resul...).
        
       | lazzlazzlazz wrote:
       | (1) What are the best more creative/less lobotomized versions of
       | Llama 2? (2) What's the best way to get one of those running in a
       | similarly easy way?
        
         | mritchie712 wrote:
         | try llama2-uncensored
         | 
         | https://github.com/jmorganca/ollama
        
         | lkbm wrote:
         | https://github.com/jmorganca/ollama was extremely simple to get
         | running on my M1 and has a couple uncensored models you can
         | just download and use.
        
           | brucemacd wrote:
           | https://github.com/jmorganca/ollama/tree/main/examples/priva.
           | .. there's an example using PrivateGPT too
        
             | dealuromanet wrote:
             | Is it private and offline via ollama? Are all ollama models
             | private and offline?
        
         | [deleted]
        
         | avereveard wrote:
         | I like this for turn by turn conversations:
         | https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b
         | 
         | this for zero shot instructions: https://huggingface.co/Open-
         | Orca/OpenOrcaxOpenChat-Preview2-...
         | 
         | easiest way would be https://github.com/oobabooga/text-
         | generation-webui
         | 
         | a little more complex way I do is I have a stack with llama.cpp
         | server, a openai adapter, and bettergpt as frontend using the
         | openai adapter as the custom endpoint. bettergpt ux beats
         | oogaboga by a long way (and chatgpt on certain aspects)
        
       | synaesthesisx wrote:
       | How this compare to just running llama.cpp locally?
        
         | mayankchhabra wrote:
         | It's an entire app (with a chatbot UI) that takes away the
         | technical legwork to run the model locally. It's a simple one
         | line `docker compose up -d` on any machine, or one click
         | install on umbrelOS home servers.
        
       | chasd00 wrote:
       | is it a free model or is the politically-correct-only response
       | constraints in place?
        
         | mayankchhabra wrote:
         | It's powered by Nous Hermes Llama2 7b. From their docs: "This
         | model stands out for its long responses, lower hallucination
         | rate, and absence of OpenAI censorship mechanisms. [...] The
         | model was trained almost entirely on synthetic GPT-4 outputs.
         | Curating high quality GPT-4 datasets enables incredibly high
         | quality in knowledge, task completion, and style."
        
         | benreesman wrote:
         | I'm a little out of date (busy few weeks), didn't the Vicuna
         | folks un-housebreak the LLaMA 2 language model (which is world
         | class) with a slightly less father-knows-best Instruct tune?
        
         | Havoc wrote:
         | Llama is definitely "censored" though I've not found this to be
         | an issue in practice. Guess it depends on what you want to do
         | with it
        
           | redox99 wrote:
           | llama-chat is censored, not base llama
        
       | ccozan wrote:
       | Ok, since is running all private, how can I add my own private
       | data? For example I have a 20+ years of an email archive that I'd
       | like to be ingested.
        
         | rdedev wrote:
         | A simple way would be to do some form of retrieval on those
         | emails and add those back to the original prompt
        
         | cromka wrote:
         | I imagine this means you'd need to come up with own model, even
         | if based on existing one.
        
           | ravishi wrote:
           | And is that hard? Sorry if this is a newbie question, I'm
           | really out of the loop on this tech. What would be required?
           | Computing power and tagging? Or can you like improve the
           | model without much human intervention? Can it be done
           | incrementally with usage and user feedback? Would a single
           | user even be able to generate enough feedback for this?
        
             | phillipcarter wrote:
             | Yes, this would be quite hard. Fine-tuning an LLM is no
             | simple task. The tools and guidance around it are very new,
             | and arguably not meant for non-ML Engineers.
        
       ___________________________________________________________________
       (page generated 2023-08-16 23:00 UTC)