[HN Gopher] PowerInfer: Fast Large Language Model Serving with a... ___________________________________________________________________ PowerInfer: Fast Large Language Model Serving with a Consumer-Grade GPU [pdf] Author : georgehill Score : 21 points Date : 2023-12-19 21:24 UTC (1 hours ago) (HTM) web link (ipads.se.sjtu.edu.cn) (TXT) w3m dump (ipads.se.sjtu.edu.cn) | LoganDark wrote: | > PowerInfer's source code is publicly available at | https://github.com/SJTU-IPADS/PowerInfer | | --- | | Just curious - PowerInfer seems to market itself by running very | large models (40B, 70B) on something like a 4090. If I have, say, | a 3060 12GB, and I want to run something like a 7B or 13B, can I | expect the same speedup of around 10x? Or does this only help | that much for models that wouldn't already fit in VRAM? ___________________________________________________________________ (page generated 2023-12-19 23:00 UTC)