[HN Gopher] Making AMD GPUs competitive for LLM inference ___________________________________________________________________ Making AMD GPUs competitive for LLM inference Author : djoldman Score : 51 points Date : 2023-08-09 18:15 UTC (4 hours ago) (HTM) web link (blog.mlc.ai) (TXT) w3m dump (blog.mlc.ai) | tails4e wrote: | Being 90% of a 4090 makes the 7900 xtx very attractive in a cost | per compute perspective, as its about 65%, and power is | significantly lower too | nkingsy wrote: | brutal slandering of AMD marketing tactics disguised as a | benchmark summary: | | https://gpu.userbenchmark.com/AMD-RX-7900-XT/Rating/4141 | zdw wrote: | There's a reason this site is banned from most hardware | forums. | GaggiX wrote: | It has been a while since I saw anyone send a Userbenchmark | link. Userbenchmark is not a site with a good reputation, you | can find many reasons online why this is, but I guess the | site moderators just ignore it by pretending it's "Advanced | Marketing", meanwhile even the CPU competitor of AMD has | banned the site from their subreddit, one of my favorite | explanation https://youtu.be/RQSBj2LKkWg but there are of | course more recent ones. | jacoblambda wrote: | Really? you are referring to userbenchmark's assessment of | AMD? | | Look at literally any of their assessments for AMD products | and they'll be dragging them through the mud while pointing | to worse performing intel products to flex how much better | they are because they aren't AMD. | superkuh wrote: | > AMD GPUs using ROCm | | Oh great. The AMD RX 580 was released in April 2018. AMD had | already dropped ROCm support for it by 2021. They only supported | the card for 3 years. 3 years. It's so lame it's bordering on | fraudulent, even if not legally fraud. Keep this in mind when | reading this news. The support won't last long, especially if you | don't buy at launch. Then you'll be stuck in the dependency hell | that is trying to use old drivers/stack. | crowwork wrote: | There is also vulkan support which should be more | universal(also included in the post), for example, the post | also shows running LLM on a steamdeck APU. | junrushao1994 wrote: | tbh im not sure what amds plan is on ROCm support on consumer | devices, but i dont really think amd is being fraudulent or | something. | | Both rocm and vulkan are supported in MLC LLM as mentioned in | our blog post. we are aware that rocm is not sufficient to | cover consumer hardwares, and in this case vulkan is a nice | backup! | zorgmonkey wrote: | How does the performance with Vulkan compare to the ROCm | performance on the same hardware? | junrushao1994 wrote: | One of the authors here. Glad it's on HackerNews! | | There are two points I personally wanted to make through this | project: | | 1) With a sufficiently optimized software stack, AMD GPUs can be | sufficiently cost-efficient to use in LLM serving; 2) ML | compilation (MLC) techniques, through its underlying TVM Unity | software stack, are the best fit in terms of cross-hardware | generalizable performance optimizations, quickly delivering time- | to-market values, etc. | | So far, to the best of our knowledge, MLC LLM delivers the best | performance across NVIDIA and AMD GPUs in single-batch inference | on quantized models, and batched/distributed inference is on the | horizon too. | JonChesterfield wrote: | Did the ROCm 5.6 toolchain work for you out of the box? If not, | what sort of hacking / hand holding did it need? | | I don't know whether there's a LLM inference benchmark in the | CI suite, if not perhaps something like this should be included | in it. | crowwork wrote: | Yes, it works out of box and the blog contains a prebuilt | python package that you can try out | junrushao1994 wrote: | ROCm has improved a lot over the past few months, and now | ROCm 5.6 seems to work out of box by just following this | tutorial: https://rocm.docs.amd.com/en/latest/deploy/linux/in | staller/i.... TVM Unity, the underlying compiler MLC LLM | uses, seems to work out of box too on ROCm 5.6 - from Bohan | Hou who sets up the environment | JonChesterfield wrote: | Awesome. I'm going to paste that into the rocm dev channel. | Actual positive feedback on HN, novel and delightful. Thank | you for the blog post too! | tails4e wrote: | When you say best performance on nvidia, do you mean against | any other method of running this model an nvidia card? | junrushao1994 wrote: | yeah we tried out popular solutions like exllama and | llama.cpp among others that support inference of 4bit | quantized models ___________________________________________________________________ (page generated 2023-08-09 23:00 UTC)