[HN Gopher] PowerInfer: Fast Large Language Model Serving with a...
       ___________________________________________________________________
        
       PowerInfer: Fast Large Language Model Serving with a Consumer-Grade
       GPU [pdf]
        
       Author : georgehill
       Score  : 21 points
       Date   : 2023-12-19 21:24 UTC (1 hours ago)
        
 (HTM) web link (ipads.se.sjtu.edu.cn)
 (TXT) w3m dump (ipads.se.sjtu.edu.cn)
        
       | LoganDark wrote:
       | > PowerInfer's source code is publicly available at
       | https://github.com/SJTU-IPADS/PowerInfer
       | 
       | ---
       | 
       | Just curious - PowerInfer seems to market itself by running very
       | large models (40B, 70B) on something like a 4090. If I have, say,
       | a 3060 12GB, and I want to run something like a 7B or 13B, can I
       | expect the same speedup of around 10x? Or does this only help
       | that much for models that wouldn't already fit in VRAM?
        
       ___________________________________________________________________
       (page generated 2023-12-19 23:00 UTC)