llama.cpp-gfx906-2511

High-performance llama.cpp fork optimized for AMD Radeon Vega 7nm (MI50/MI60) GPUs. Features DPP-accelerated reductions, vectorized quantization kernels, and Q8-optimized flash attention delivering up to 18% faster inference on GFX906 architecture. Based on llama.cpp build 7127.

Benchmark Results

What Changed

common.cuh           DPP-based warp reductions with unified shuffle XOR dispatch
fattn-common.cuh     GCN-optimized thread counts and tile configurations
fattn.cu             Q8-optimized tile kernel selection for GFX906 flash attention
mmq.cu               Integrated GFX906 vectorized loads for Q4_0/Q4_1 quantizations
gfx906/              New directory with MI50/MI60-specific kernel implementations

Compile-Time Configuration

// Disable Split-K for GFX906 
#define GFX906_FATTN_SPLIT_K_ENABLED 0

// Enable Q8 quantized flash attention
#define GFX906_FATTN_Q8_ENABLED 1

// Enable DPP-based warp reductions
#define GFX906_USE_DPP_REDUCTIONS 1

Quick Start

git clone https://github.com/iacopPBK/llama.cpp-gfx906.git
cd llama.cpp-gfx906
./SCRIPT_compile_MI50.sh      # edit ROCM_PATH if not using /opt/rocm
./SCRIPT_launch_server_MI50.sh # edit MODEL_PATH to your model file

Tested with ROCm nightly build and GFX906 GPU (MI50/MI60).

Power Scaling

Performance scales with power limit using SCRIPT_overclock_upp_MI50.sh for MI50 overclocking via UPP (Powerplay Table Editor).

Links

AMD GCN ISA ・ llama.cpp ・ ROCm ・ GFX906 Discord ・ wiki-gfx906 ・ llama-labs-gfx906

_{Built for the GFX906 community}

Name		Name	Last commit message	Last commit date
Latest commit History 7,129 Commits
.devops		.devops
.github		.github
benches/dgx-spark		benches/dgx-spark
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
ggml		ggml
gguf-py		gguf-py
grammars		grammars
include		include
licenses		licenses
media		media
models		models
pocs		pocs
requirements		requirements
scripts		scripts
src		src
tests		tests
tools		tools
vendor		vendor
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SCRIPT_compile_MI50.sh		SCRIPT_compile_MI50.sh
SCRIPT_launch_server_MI50.sh		SCRIPT_launch_server_MI50.sh
SCRIPT_llama_bench.sh		SCRIPT_llama_bench.sh
SCRIPT_overclock_upp_MI50.sh		SCRIPT_overclock_upp_MI50.sh
SECURITY.md		SECURITY.md
benchmarks.svg		benchmarks.svg
build-xcframework.sh		build-xcframework.sh
convert_hf_to_gguf.py		convert_hf_to_gguf.py
convert_hf_to_gguf_update.py		convert_hf_to_gguf_update.py
convert_llama_ggml_to_gguf.py		convert_llama_ggml_to_gguf.py
convert_lora_to_gguf.py		convert_lora_to_gguf.py
flake.lock		flake.lock
flake.nix		flake.nix
mypy.ini		mypy.ini
poetry.lock		poetry.lock
power_sweep_pp.svg		power_sweep_pp.svg
power_sweep_tg.svg		power_sweep_tg.svg
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llama.cpp-gfx906-2511

Benchmark Results

What Changed

Compile-Time Configuration

Quick Start

Power Scaling

Links

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

License

iacopPBK/llama.cpp-gfx906

Folders and files

Latest commit

History

Repository files navigation

llama.cpp-gfx906-2511

Benchmark Results

What Changed

Compile-Time Configuration

Quick Start

Power Scaling

Links

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages