[HN Gopher] Making AMD GPUs competitive for LLM inference
       ___________________________________________________________________
        
       Making AMD GPUs competitive for LLM inference
        
       Author : djoldman
       Score  : 51 points
       Date   : 2023-08-09 18:15 UTC (4 hours ago)
        
 (HTM) web link (blog.mlc.ai)
 (TXT) w3m dump (blog.mlc.ai)
        
       | tails4e wrote:
       | Being 90% of a 4090 makes the 7900 xtx very attractive in a cost
       | per compute perspective, as its about 65%, and power is
       | significantly lower too
        
         | nkingsy wrote:
         | brutal slandering of AMD marketing tactics disguised as a
         | benchmark summary:
         | 
         | https://gpu.userbenchmark.com/AMD-RX-7900-XT/Rating/4141
        
           | zdw wrote:
           | There's a reason this site is banned from most hardware
           | forums.
        
           | GaggiX wrote:
           | It has been a while since I saw anyone send a Userbenchmark
           | link. Userbenchmark is not a site with a good reputation, you
           | can find many reasons online why this is, but I guess the
           | site moderators just ignore it by pretending it's "Advanced
           | Marketing", meanwhile even the CPU competitor of AMD has
           | banned the site from their subreddit, one of my favorite
           | explanation https://youtu.be/RQSBj2LKkWg but there are of
           | course more recent ones.
        
           | jacoblambda wrote:
           | Really? you are referring to userbenchmark's assessment of
           | AMD?
           | 
           | Look at literally any of their assessments for AMD products
           | and they'll be dragging them through the mud while pointing
           | to worse performing intel products to flex how much better
           | they are because they aren't AMD.
        
       | superkuh wrote:
       | > AMD GPUs using ROCm
       | 
       | Oh great. The AMD RX 580 was released in April 2018. AMD had
       | already dropped ROCm support for it by 2021. They only supported
       | the card for 3 years. 3 years. It's so lame it's bordering on
       | fraudulent, even if not legally fraud. Keep this in mind when
       | reading this news. The support won't last long, especially if you
       | don't buy at launch. Then you'll be stuck in the dependency hell
       | that is trying to use old drivers/stack.
        
         | crowwork wrote:
         | There is also vulkan support which should be more
         | universal(also included in the post), for example, the post
         | also shows running LLM on a steamdeck APU.
        
         | junrushao1994 wrote:
         | tbh im not sure what amds plan is on ROCm support on consumer
         | devices, but i dont really think amd is being fraudulent or
         | something.
         | 
         | Both rocm and vulkan are supported in MLC LLM as mentioned in
         | our blog post. we are aware that rocm is not sufficient to
         | cover consumer hardwares, and in this case vulkan is a nice
         | backup!
        
           | zorgmonkey wrote:
           | How does the performance with Vulkan compare to the ROCm
           | performance on the same hardware?
        
       | junrushao1994 wrote:
       | One of the authors here. Glad it's on HackerNews!
       | 
       | There are two points I personally wanted to make through this
       | project:
       | 
       | 1) With a sufficiently optimized software stack, AMD GPUs can be
       | sufficiently cost-efficient to use in LLM serving; 2) ML
       | compilation (MLC) techniques, through its underlying TVM Unity
       | software stack, are the best fit in terms of cross-hardware
       | generalizable performance optimizations, quickly delivering time-
       | to-market values, etc.
       | 
       | So far, to the best of our knowledge, MLC LLM delivers the best
       | performance across NVIDIA and AMD GPUs in single-batch inference
       | on quantized models, and batched/distributed inference is on the
       | horizon too.
        
         | JonChesterfield wrote:
         | Did the ROCm 5.6 toolchain work for you out of the box? If not,
         | what sort of hacking / hand holding did it need?
         | 
         | I don't know whether there's a LLM inference benchmark in the
         | CI suite, if not perhaps something like this should be included
         | in it.
        
           | crowwork wrote:
           | Yes, it works out of box and the blog contains a prebuilt
           | python package that you can try out
        
           | junrushao1994 wrote:
           | ROCm has improved a lot over the past few months, and now
           | ROCm 5.6 seems to work out of box by just following this
           | tutorial: https://rocm.docs.amd.com/en/latest/deploy/linux/in
           | staller/i.... TVM Unity, the underlying compiler MLC LLM
           | uses, seems to work out of box too on ROCm 5.6 - from Bohan
           | Hou who sets up the environment
        
             | JonChesterfield wrote:
             | Awesome. I'm going to paste that into the rocm dev channel.
             | Actual positive feedback on HN, novel and delightful. Thank
             | you for the blog post too!
        
         | tails4e wrote:
         | When you say best performance on nvidia, do you mean against
         | any other method of running this model an nvidia card?
        
           | junrushao1994 wrote:
           | yeah we tried out popular solutions like exllama and
           | llama.cpp among others that support inference of 4bit
           | quantized models
        
       ___________________________________________________________________
       (page generated 2023-08-09 23:00 UTC)