[HN Gopher] FlexGen: Running large language models on a single GPU ___________________________________________________________________ FlexGen: Running large language models on a single GPU Author : behnamoh Score : 126 points Date : 2023-03-26 05:31 UTC (17 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | aliljet wrote: | This is absolutely something I was running into with LLaMA. I'm | curious if this potentially extends into that particular use | case... | stavros wrote: | What's the currently best-performing LLM that one can run with | this? | zargon wrote: | Previous discussion is here (266 comments): | https://news.ycombinator.com/item?id=34869960 | kkielhofner wrote: | What's really amazing about a lot of these recent projects is | they tend to provide benchmarks running on an Nvidia T4. They use | these because they're relatively cheap from cloud providers and | you can usually actually get them (as opposed to requesting and | getting denied for an A100 or whatever). | | For those that aren't familiar with it they are tiny power and | density optimized GPUs. I have the successor (A2) and total max | TDP is 60 watts. Single slot. Slot only power, and passively | cooled. | | Depending on workload I observe it to be roughly 5-10x slower | than a 3090, which means for most people at home with a spare | Nvidia gaming card (or whatever) you'll see results from these | project at a performance multiple of the benchmarks they provide. | | The one caveat is that the T4/A2 have 16GB VRAM, which makes them | more capable (albeit slower) than a "low end" desktop card like | the 3070 which has only 8GB VRAM. But as HN readers know there is | incredible progress daily to reduce VRAM requirements for these | models! | [deleted] | lxe wrote: | Best way to mess around with FlexGen and LLMs on local hardware | in general is https://github.com/oobabooga/text-generation-webui | boppo1 wrote: | Can't get it to run on amd 6700xt even though there's ROCm | installation instructions. Tried to run llama 7b but got hung | up because bitsandbytes calls CUDA. | [deleted] | [deleted] | stuckinhell wrote: | This is absolutely stunning work. Excited to try it out on my | husband's homelab. | ByThyGrace wrote: | I'm just here waiting for the "LLM retard guide" to come out, as | it happened for stable diffusion last August. | arthurcolle wrote: | Link to stable diffusion ref you're referring to? I was able to | run the model and everything, so pretty familiar, but just | wondering if you're referring to a specific document! Haha | neilv wrote: | > _This project was made possible thanks to a collaboration with | ... Yandex Research ..._ | | I'm all for global cooperation and fellowship. Are sanctions | going to be a barrier for this and related projects? | blagie wrote: | It depends on the work and the sanctions regime, but in | general: | | - The best sanctions impact targeted industries (e.g. anything | needed to build tanks, warplanes, etc.). | | - The worst sanctions impact communications and collaboration. | Change comes from conversations. Media, non-military education, | and non-military academic collaboration are bad usually bad | sanctions. | lxe wrote: | Bing says: I could not find any specific | information on how these sanctions affect academic research and | cooperation between the US and Russia. Some sources suggest | that some academics have canceled conferences, joint projects, | and funding with Russian institutions as a form of self-imposed | sanctions2, while others indicate that Russian students are | still able to secure visas to study abroad3. Therefore, the | impact of the sanctions on academic research and cooperation | may vary depending on the field, institution, and individual | circumstances. ___________________________________________________________________ (page generated 2023-03-26 23:00 UTC)