Releases: ggml-org/llama.cpp
b7285
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
HIP : fix RDNA4 build (#17792)
macOS/iOS:
Linux:
Windows:
b7278
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ci : transform release binary root dir in tar to llama-bXXXX (#17773)
-
transform release binary root dir in tar to llama-bXXXX
-
bsdtar supports -s instead of --transform
macOS/iOS:
Linux:
Windows:
b7276
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
Add support for CUMSUM and TRI for CUDA. (#17584)
-
Add support for CUMSUM and TRI for CUDA.
-
Minor optimizations.
-
Correct warp_prefix_inclusive_sum in float2 variant to return float2
-
Optimize TRI
-
Whitespace
-
Fix strides.
-
Implement double loop
-
Whitespace
-
Fix HIP compilation bugs
-
Optimizations + big case performance tests
-
Implement using CUB with fallback to custom kernel
-
Remove error message.
-
Fixes from code review
-
Comment out CPU-unsupported F16/BF16 cases to fix CI
-
Fine, you win :P
-
Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS
-
Vary warp-size based on physical warp size
-
Add GGML_UNUSED_VARS in tri as well
-
Use constexpr and call prefix_inclusive with warp_size template param
-
Update ggml/src/ggml-cuda/cumsum.cu
Co-authored-by: Johannes Gäßler johannesg@5d6.de
- Apply suggestions from code review
Co-authored-by: Johannes Gäßler johannesg@5d6.de
-
Change to tid % warp_size
-
Fix strides; hardcode mask; add ggml_lane_mask_t
-
Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()
-
Too hasty...
Co-authored-by: Johannes Gäßler johannesg@5d6.de
macOS/iOS:
Linux:
Windows:
b7275
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
metal: TRI, FILL, EXPM1, SOFTPLUS (#16623)
- feat(wip): Port initial TRI impl from pervious work
The kernel does not work and is not optimized, but the
code compiles and runs, so this will be the starting point
now that the core op has been merged.
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- fix: Remove argument for constant val override
This was added in the original draft, but later removed. With this, the
kernel now passes tests.
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- feat: Move the ttype conditional to templating to avoid conditional in kernel
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- fix: Type fixes
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- feat: Add softplus for metal
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- feat: Add EXPM1 for metal
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- feat: Add FILL for metal
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- fix: Remove unused arguments
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
- refactor: Use select instead of branch for softplus non-vec
Branch: ggml-cumsum-tri
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Signed-off-by: Gabe Goodhart ghart@us.ibm.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
Windows:
b7274
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
server: strip content-length header on proxy (#17734)
macOS/iOS:
Linux:
Windows:
b7273
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
server: move msg diffs tracking to HTTP thread (#17740)
-
server: move msg diffs tracking to HTTP thread
-
wip
-
tool call tests ok
-
minor : style
-
cont : fix
-
move states to server_response_reader
-
add safe-guard
-
fix
-
fix 2
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
Windows:
b7271
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
common : skip model validation when --help is requested (#17755)
This commit skips the model validation check when the user specifies the
--help option.
The motivation for this is that currently and error is thrown before the
--help could be processed. Now skips validation if params.usage is set,
allowing help to display without requiring --model.
Resolves: #17754
macOS/iOS:
Linux:
Windows:
b7270
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
ggml-cpu : remove asserts always evaluating to false (#17728)
macOS/iOS:
Linux:
Windows:
b7268
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
cmake : simplify build info detection using standard variables (#17423)
The current approach has several drawbacks. Mostly, when
cross-compiling, invoking the compiler binary directly to query the
machine hardware can behave unexpectedly depending on the toolchain
wrapper (using COMPILER_TARGET, CFLAGS, etc).
As CMake is the official tool to build llama.cpp, I propose to only rely
on it to get those variables (CMAKE_SYSTEM_NAME and
CMAKE_SYSTEM_PROCESSOR).
Signed-off-by: Adrien Gallouët angt@huggingface.co
macOS/iOS:
Linux:
Windows:
b7266
Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
common: use native MultiByteToWideChar (#17738)
std::codecvt_utf8<wchar_t> is deprecated and produces warnings:
common/common.cpp:792:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
792 | std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
|
Signed-off-by: Adrien Gallouët angt@huggingface.co
macOS/iOS:
Linux:
Windows: