It seems fine so far running some dev workloads invoked as:
llama-server -m gpt-oss-20b-mxfp4.gguf -ngl 99 -fa on --ctx-size 0 --jinja --host ::0
Seeing a nice performance increase too.
Previous version - libggml-0.9.4pl20251120 & llama.cpp-0.0.7086:
llama-bench -m models/gpt-oss-20b-mxfp4.gguf -fa on -ngl 99
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | pp512 | 1839.89 ± 34.88 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | tg128 | 129.02 ± 0.25 |
llama-bench -m models/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -ngl 99 -ncmoe 8
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model | size | params | backend | ngl | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 1 | pp512 | 69.25 ± 2.18 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 1 | tg128 | 16.97 ± 0.04 |
This version - libggml-0.9.5pl20260130 & llama.cpp-0.0.7883:
llama-bench -m models/gpt-oss-20b-mxfp4.gguf -fa on -ngl 99
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | pp512 | 2405.75 ± 25.05 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | tg128 | 128.82 ± 0.35 |
llama-bench -m models/gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -ngl 99 -ncmoe 8
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /usr/local/lib/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/local/lib/libggml-cpu-haswell.so
| model | size | params | backend | ngl | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 1 | pp512 | 79.29 ± 1.72 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Vulkan | 99 | 1 | tg128 | 17.21 ± 0.10 |
No comments:
Post a Comment