While Vulkan can be a good fallback, for LLM inference at least, the performance difference is not as insignificant as you believe. I just ran a test on the latest pull just to make sure this is still the case on llama.cpp HEAD, but text generation is +44% faster and prompt processing is +202% (~3X) faster with ROCm vs Vulkan.
Note: if you're building llama.cpp, all you have to do is swap GGML_HIPBLAS=1 and GGML_VULKAN=1 so the extra effort is just installing ROCm? (vs the Vulkan devtools)
> ...so the extra effort is just installing ROCm? (vs the Vulkan devtools)
The problem with ROCm is that for non-bleeding edge AMD cards you have to install an out of date unsupported version of it because the $current version does not support your card. And that means containerization woes. If you're going to spend $800 on a top of the line current generation video card anyway then you'll have fewer problems (for a few years).
Also, the vulkan vs. rocm performance difference for non-bleeding edge non-top of the line cards is smaller.
Any chance we might see Vulkan extensions to close this performance gap? Was really hoping Intel and AMD would team up to vreate an open standard that we could all have installed by default, but instead we get these clumsy vendor-specific solutions...
I think that it is very unlikely that the performance difference is caused by anything that could be solved with a Vulkan extension.
Vulkan only exposes the raw compute capabilities of the hardware and any well optimized Vulkan application can reach the full performance, but you need to write such optimized code.
On the other hand, ROCm, like CUDA, includes optimized libraries for certain applications, like rocBLAS.
It is likely that here the ROCm backend uses optimized library functions, perhaps from rocBLAS, while the Vulkan backend might use some generic functions for linear algebra, which are not optimized for the AMD GPUs.
Note: if you're building llama.cpp, all you have to do is swap GGML_HIPBLAS=1 and GGML_VULKAN=1 so the extra effort is just installing ROCm? (vs the Vulkan devtools)
ROCm:
Vulkan: EDIT: HN should really support markdown...