Skip to content

iacopPBK/llama.cpp-gfx906

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llama.cpp-gfx906-2511

High-performance llama.cpp fork optimized for AMD Radeon Vega 7nm (MI50/MI60) GPUs. Features DPP-accelerated reductions, vectorized quantization kernels, and Q8-optimized flash attention delivering up to 18% faster inference on GFX906 architecture. Based on llama.cpp build 7127.

Benchmark Results

Benchmark Results

What Changed

common.cuh           DPP-based warp reductions with unified shuffle XOR dispatch
fattn-common.cuh     GCN-optimized thread counts and tile configurations
fattn.cu             Q8-optimized tile kernel selection for GFX906 flash attention
mmq.cu               Integrated GFX906 vectorized loads for Q4_0/Q4_1 quantizations
gfx906/              New directory with MI50/MI60-specific kernel implementations

Compile-Time Configuration

// Disable Split-K for GFX906 
#define GFX906_FATTN_SPLIT_K_ENABLED 0

// Enable Q8 quantized flash attention
#define GFX906_FATTN_Q8_ENABLED 1

// Enable DPP-based warp reductions
#define GFX906_USE_DPP_REDUCTIONS 1

Quick Start

git clone https://github.com/iacopPBK/llama.cpp-gfx906.git
cd llama.cpp-gfx906
./SCRIPT_compile_MI50.sh      # edit ROCM_PATH if not using /opt/rocm
./SCRIPT_launch_server_MI50.sh # edit MODEL_PATH to your model file

Tested with ROCm nightly build and GFX906 GPU (MI50/MI60).

Power Scaling

Performance scales with power limit using SCRIPT_overclock_upp_MI50.sh for MI50 overclocking via UPP (Powerplay Table Editor).

PP Performance

TG Performance

Links

AMD GCN ISAllama.cppROCmGFX906 Discordwiki-gfx906llama-labs-gfx906

Built for the GFX906 community

Packages

 
 
 

Languages

  • C++ 55.7%
  • C 12.7%
  • Python 8.0%
  • Cuda 6.7%
  • HTML 4.9%
  • Metal 2.1%
  • Other 9.9%