While Vulkan can be a good fallback, for LLM inference at least, the performance...

superkuh · on Sept 24, 2024

> ...so the extra effort is just installing ROCm? (vs the Vulkan devtools)

The problem with ROCm is that for non-bleeding edge AMD cards you have to install an out of date unsupported version of it because the $current version does not support your card. And that means containerization woes. If you're going to spend $800 on a top of the line current generation video card anyway then you'll have fewer problems (for a few years).

Also, the vulkan vs. rocm performance difference for non-bleeding edge non-top of the line cards is smaller.

coppsilgold · on Sept 24, 2024

Radeon RX 7900 XTX is RDNA3 but I wonder if llama.cpp is using the Vulkan matrix instructions wmma and mfma.

I have not noticed any remarkable differences between Vulkan and ROCm when using IREE but it's not a turnkey solution yet[1].

[1] <https://github.com/nod-ai/sharktank/blob/main/docs/model_coo...>

tormeh · on Sept 24, 2024

Any chance we might see Vulkan extensions to close this performance gap? Was really hoping Intel and AMD would team up to vreate an open standard that we could all have installed by default, but instead we get these clumsy vendor-specific solutions...

adrian_b · on Sept 24, 2024

I think that it is very unlikely that the performance difference is caused by anything that could be solved with a Vulkan extension.

Vulkan only exposes the raw compute capabilities of the hardware and any well optimized Vulkan application can reach the full performance, but you need to write such optimized code.

On the other hand, ROCm, like CUDA, includes optimized libraries for certain applications, like rocBLAS.

It is likely that here the ROCm backend uses optimized library functions, perhaps from rocBLAS, while the Vulkan backend might use some generic functions for linear algebra, which are not optimized for the AMD GPUs.