hngopher.com

       [HN Gopher] FlexGen: Running large language models on a single GPU
       ___________________________________________________________________
        
       FlexGen: Running large language models on a single GPU
        
       Author : behnamoh
       Score  : 126 points
       Date   : 2023-03-26 05:31 UTC (17 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | aliljet wrote:
       | This is absolutely something I was running into with LLaMA. I'm
       | curious if this potentially extends into that particular use
       | case...
        
       | stavros wrote:
       | What's the currently best-performing LLM that one can run with
       | this?
        
       | zargon wrote:
       | Previous discussion is here (266 comments):
       | https://news.ycombinator.com/item?id=34869960
        
       | kkielhofner wrote:
       | What's really amazing about a lot of these recent projects is
       | they tend to provide benchmarks running on an Nvidia T4. They use
       | these because they're relatively cheap from cloud providers and
       | you can usually actually get them (as opposed to requesting and
       | getting denied for an A100 or whatever).
       | 
       | For those that aren't familiar with it they are tiny power and
       | density optimized GPUs. I have the successor (A2) and total max
       | TDP is 60 watts. Single slot. Slot only power, and passively
       | cooled.
       | 
       | Depending on workload I observe it to be roughly 5-10x slower
       | than a 3090, which means for most people at home with a spare
       | Nvidia gaming card (or whatever) you'll see results from these
       | project at a performance multiple of the benchmarks they provide.
       | 
       | The one caveat is that the T4/A2 have 16GB VRAM, which makes them
       | more capable (albeit slower) than a "low end" desktop card like
       | the 3070 which has only 8GB VRAM. But as HN readers know there is
       | incredible progress daily to reduce VRAM requirements for these
       | models!
        
         | [deleted]
        
       | lxe wrote:
       | Best way to mess around with FlexGen and LLMs on local hardware
       | in general is https://github.com/oobabooga/text-generation-webui
        
         | boppo1 wrote:
         | Can't get it to run on amd 6700xt even though there's ROCm
         | installation instructions. Tried to run llama 7b but got hung
         | up because bitsandbytes calls CUDA.
        
       | [deleted]
        
       | [deleted]
        
       | stuckinhell wrote:
       | This is absolutely stunning work. Excited to try it out on my
       | husband's homelab.
        
       | ByThyGrace wrote:
       | I'm just here waiting for the "LLM retard guide" to come out, as
       | it happened for stable diffusion last August.
        
         | arthurcolle wrote:
         | Link to stable diffusion ref you're referring to? I was able to
         | run the model and everything, so pretty familiar, but just
         | wondering if you're referring to a specific document! Haha
        
       | neilv wrote:
       | > _This project was made possible thanks to a collaboration with
       | ... Yandex Research ..._
       | 
       | I'm all for global cooperation and fellowship. Are sanctions
       | going to be a barrier for this and related projects?
        
         | blagie wrote:
         | It depends on the work and the sanctions regime, but in
         | general:
         | 
         | - The best sanctions impact targeted industries (e.g. anything
         | needed to build tanks, warplanes, etc.).
         | 
         | - The worst sanctions impact communications and collaboration.
         | Change comes from conversations. Media, non-military education,
         | and non-military academic collaboration are bad usually bad
         | sanctions.
        
         | lxe wrote:
         | Bing says:                   I could not find any specific
         | information on how these sanctions affect academic research and
         | cooperation between the US and Russia. Some sources suggest
         | that some academics have canceled conferences, joint projects,
         | and funding with Russian institutions as a form of self-imposed
         | sanctions2, while others indicate that Russian students are
         | still able to secure visas to study abroad3. Therefore, the
         | impact of the sanctions on academic research and cooperation
         | may vary depending on the field, institution, and individual
         | circumstances.
        
       ___________________________________________________________________
       (page generated 2023-03-26 23:00 UTC)