Skip to content

Releases: ggml-org/llama.cpp

b7285

05 Dec 16:00
6016d0b

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

HIP : fix RDNA4 build (#17792)

macOS/iOS:

Linux:

Windows:

b7278

05 Dec 04:27
03d9a77

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ci : transform release binary root dir in tar to llama-bXXXX (#17773)

  • transform release binary root dir in tar to llama-bXXXX

  • bsdtar supports -s instead of --transform

macOS/iOS:

Linux:

Windows:

b7276

05 Dec 01:21
96fe9ba

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

Add support for CUMSUM and TRI for CUDA. (#17584)

  • Add support for CUMSUM and TRI for CUDA.

  • Minor optimizations.

  • Correct warp_prefix_inclusive_sum in float2 variant to return float2

  • Optimize TRI

  • Whitespace

  • Fix strides.

  • Implement double loop

  • Whitespace

  • Fix HIP compilation bugs

  • Optimizations + big case performance tests

  • Implement using CUB with fallback to custom kernel

  • Remove error message.

  • Fixes from code review

  • Comment out CPU-unsupported F16/BF16 cases to fix CI

  • Fine, you win :P

  • Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS

  • Vary warp-size based on physical warp size

  • Add GGML_UNUSED_VARS in tri as well

  • Use constexpr and call prefix_inclusive with warp_size template param

  • Update ggml/src/ggml-cuda/cumsum.cu

Co-authored-by: Johannes Gäßler johannesg@5d6.de

  • Apply suggestions from code review

Co-authored-by: Johannes Gäßler johannesg@5d6.de

  • Change to tid % warp_size

  • Fix strides; hardcode mask; add ggml_lane_mask_t

  • Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()

  • Too hasty...


Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

b7275

04 Dec 23:04
bde188d

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

metal: TRI, FILL, EXPM1, SOFTPLUS (#16623)

  • feat(wip): Port initial TRI impl from pervious work

The kernel does not work and is not optimized, but the
code compiles and runs, so this will be the starting point
now that the core op has been merged.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Remove argument for constant val override

This was added in the original draft, but later removed. With this, the
kernel now passes tests.

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Move the ttype conditional to templating to avoid conditional in kernel

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Type fixes

Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

  • feat: Add softplus for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Add EXPM1 for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • feat: Add FILL for metal

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • fix: Remove unused arguments

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com

  • refactor: Use select instead of branch for softplus non-vec

Branch: ggml-cumsum-tri

Signed-off-by: Gabe Goodhart ghart@us.ibm.com


Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

b7274

04 Dec 22:05
9d02299

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

server: strip content-length header on proxy (#17734)

macOS/iOS:

Linux:

Windows:

b7273

04 Dec 21:17
c4c10bf

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

server: move msg diffs tracking to HTTP thread (#17740)

  • server: move msg diffs tracking to HTTP thread

  • wip

  • tool call tests ok

  • minor : style

  • cont : fix

  • move states to server_response_reader

  • add safe-guard

  • fix

  • fix 2


Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

b7271

04 Dec 17:14
bd4ef13

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

common : skip model validation when --help is requested (#17755)

This commit skips the model validation check when the user specifies the
--help option.

The motivation for this is that currently and error is thrown before the
--help could be processed. Now skips validation if params.usage is set,
allowing help to display without requiring --model.

Resolves: #17754

macOS/iOS:

Linux:

Windows:

b7270

04 Dec 16:27
87a2084

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

ggml-cpu : remove asserts always evaluating to false (#17728)

macOS/iOS:

Linux:

Windows:

b7268

04 Dec 14:03
2a73f81

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

cmake : simplify build info detection using standard variables (#17423)

The current approach has several drawbacks. Mostly, when
cross-compiling, invoking the compiler binary directly to query the
machine hardware can behave unexpectedly depending on the toolchain
wrapper (using COMPILER_TARGET, CFLAGS, etc).

As CMake is the official tool to build llama.cpp, I propose to only rely
on it to get those variables (CMAKE_SYSTEM_NAME and
CMAKE_SYSTEM_PROCESSOR).

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Windows:

b7266

04 Dec 12:06
83c1171

Choose a tag to compare

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

common: use native MultiByteToWideChar (#17738)

std::codecvt_utf8<wchar_t> is deprecated and produces warnings:

common/common.cpp:792:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
  792 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

Linux:

Windows: