Releases · ggml-org/llama.cpp

05 Dec 16:00

6016d0b

b7285 Latest

Latest

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

HIP : fix RDNA4 build (#17792)

macOS/iOS:

Linux:

Windows:

Assets 22

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-12-05T16:00:13Z
llama-b7285-bin-macos-arm64.tar.gz

sha256:a762e2c6601bbbb6e4f138783f49063307d7f8e185b1d108386fecf0a08a0e16

13.2 MB 2025-12-05T16:00:31Z
llama-b7285-bin-macos-arm64.zip

sha256:3b3791e7b8b254581af03ab6da857d1f9464ba2d5539e609c5356884a7b3929e

13.2 MB 2025-12-05T16:00:32Z
llama-b7285-bin-macos-x64.tar.gz

sha256:97ab385ca6ae7f12802d942d6c495ea3a9790649400d457e23c208b15e18de68

35.2 MB 2025-12-05T16:00:34Z
llama-b7285-bin-macos-x64.zip

sha256:fe008efafe56d3b319786c8fe073cd719db486e473eb11a69c2f74b70fc63e56

35.2 MB 2025-12-05T16:00:36Z
llama-b7285-bin-ubuntu-s390x.tar.gz

sha256:a23615f4238ce7bc6857d087d63dd1a35af45d480b71a8c8cf0a5b15290c1497

17.2 MB 2025-12-05T16:00:39Z
llama-b7285-bin-ubuntu-s390x.zip

sha256:e7242c9f3708b405d28b3848d63224be70d8afdef7c066dc8c33b225eb7dd80e

15.1 MB 2025-12-05T16:00:40Z
llama-b7285-bin-ubuntu-vulkan-x64.tar.gz

sha256:10377b981a31040e81a8e5b65c4bb32747ea4bbe8b22c7d2d28968e15e36665b

30 MB 2025-12-05T16:00:42Z
llama-b7285-bin-ubuntu-vulkan-x64.zip

sha256:9cd76c0c0a690f9e3676448003f706d49d283b982c4138343d416742e49ef44c

30 MB 2025-12-05T16:00:45Z
llama-b7285-bin-ubuntu-x64.tar.gz

sha256:013cb13ccc4db585e5fdd664ec5766717d1745d2af808cddb5c0461695baca3d

15.3 MB 2025-12-05T16:00:47Z
Source code (zip)

2025-12-05T12:47:52Z
Source code (tar.gz)

2025-12-05T12:47:52Z

05 Dec 04:27

github-actions

b7278

03d9a77

b7278

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ci : transform release binary root dir in tar to llama-bXXXX (#17773)

transform release binary root dir in tar to llama-bXXXX
bsdtar supports -s instead of --transform

macOS/iOS:

Linux:

Windows:

Assets 22

05 Dec 01:21

github-actions

b7276

96fe9ba

b7276

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

Add support for CUMSUM and TRI for CUDA. (#17584)

Add support for CUMSUM and TRI for CUDA.
Minor optimizations.
Correct warp_prefix_inclusive_sum in float2 variant to return float2
Optimize TRI
Whitespace
Fix strides.
Implement double loop
Whitespace
Fix HIP compilation bugs
Optimizations + big case performance tests
Implement using CUB with fallback to custom kernel
Remove error message.
Fixes from code review
Comment out CPU-unsupported F16/BF16 cases to fix CI
Fine, you win :P
Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS
Vary warp-size based on physical warp size
Add GGML_UNUSED_VARS in tri as well
Use constexpr and call prefix_inclusive with warp_size template param
Update ggml/src/ggml-cuda/cumsum.cu

Co-authored-by: Johannes Gäßler johannesg@5d6.de

Apply suggestions from code review

Co-authored-by: Johannes Gäßler johannesg@5d6.de

Change to tid % warp_size
Fix strides; hardcode mask; add ggml_lane_mask_t
Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()
Too hasty...

Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 23:04

github-actions

b7275

bde188d

b7275

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: TRI, FILL, EXPM1, SOFTPLUS (#16623)

feat(wip): Port initial TRI impl from pervious work

The kernel does not work and is not optimized, but the
code compiles and runs, so this will be the starting point
now that the core op has been merged.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

fix: Remove argument for constant val override

This was added in the original draft, but later removed. With this, the
kernel now passes tests.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

feat: Move the ttype conditional to templating to avoid conditional in kernel

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

fix: Type fixes

Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

feat: Add softplus for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

feat: Add EXPM1 for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

feat: Add FILL for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

fix: Remove unused arguments

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

refactor: Use select instead of branch for softplus non-vec

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 22:05

github-actions

b7274

9d02299

b7274

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

server: strip content-length header on proxy (#17734)

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 21:17

github-actions

b7273

c4c10bf

b7273

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

server: move msg diffs tracking to HTTP thread (#17740)

server: move msg diffs tracking to HTTP thread
wip
tool call tests ok
minor : style
cont : fix
move states to server_response_reader
add safe-guard
fix
fix 2

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 17:14

github-actions

b7271

bd4ef13

b7271

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

common : skip model validation when --help is requested (#17755)

This commit skips the model validation check when the user specifies the
--help option.

The motivation for this is that currently and error is thrown before the
--help could be processed. Now skips validation if params.usage is set,
allowing help to display without requiring --model.

Resolves: #17754

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 16:27

github-actions

b7270

87a2084

b7270

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml-cpu : remove asserts always evaluating to false (#17728)

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 14:03

github-actions

b7268

2a73f81

b7268

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

cmake : simplify build info detection using standard variables (#17423)

The current approach has several drawbacks. Mostly, when
cross-compiling, invoking the compiler binary directly to query the
machine hardware can behave unexpectedly depending on the toolchain
wrapper (using COMPILER_TARGET, CFLAGS, etc).

As CMake is the official tool to build llama.cpp, I propose to only rely
on it to get those variables (CMAKE_SYSTEM_NAME and
CMAKE_SYSTEM_PROCESSOR).

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Windows:

Assets 22

04 Dec 12:06

github-actions

b7266

83c1171

b7266

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

common: use native MultiByteToWideChar (#17738)

std::codecvt_utf8<wchar_t> is deprecated and produces warnings:

common/common.cpp:792:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
  792 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Windows:

Assets 22

Releases: ggml-org/llama.cpp

b7285

Uh oh!

b7278

Uh oh!

b7276

Uh oh!

b7275

Uh oh!

b7274

Uh oh!

b7273

Uh oh!

b7271

Uh oh!

b7270

Uh oh!

b7268

Uh oh!

b7266

Uh oh!