Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
433 commits
Select commit Hold shift + click to select a range
3d7b227
common : use cpp-httplib as a cURL alternative for downloads (#16185)
angt Sep 26, 2025
667d5f4
metal : report OOM errors (#16274)
ggerganov Sep 26, 2025
ff84e4d
mtmd : fix uninitialized variable in bicubic_resize (#16275)
AlekseiNikiforovIBM Sep 26, 2025
2f0f872
codeowners : add rgerganov as owner of RPC [no ci] (#16279)
rgerganov Sep 26, 2025
9f08f25
Always show message actions for mobile UI + improvements for user mes…
allozaur Sep 26, 2025
9a25257
webui: switch to hash-based routing (alternative of #16079) (#16157)
isaac-mcfadyen Sep 26, 2025
617549f
Allow viewing conversations even when llama server is down (#16255)
allozaur Sep 26, 2025
e03fa1d
Enhance text file detection logic for file attachments (#16199)
allozaur Sep 26, 2025
807f6f6
devops: add s390x & ppc64le CI (#15925)
taronaeo Sep 26, 2025
6d4a32e
model : make minicpm embedding_scale, residual_scale and logit_scale …
vinkal-chudgar Sep 26, 2025
ce07a80
build : add LLAMA_OPENSSL option (#16287)
angt Sep 27, 2025
b86b3bf
vulkan: support GET_ROWS for k-quants (#16235)
jeffbolznv Sep 27, 2025
e3638af
server : remove old LLAMA_SERVER_SSL (#16290)
angt Sep 27, 2025
6e124be
vulkan: throw system error instead of SIGABRT during init on older de…
DmyMi Sep 27, 2025
ba67302
CUDA: refactor and deduplicate vector FA kernels (#16208)
JohannesGaessler Sep 27, 2025
9e9c76e
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#…
am17an Sep 27, 2025
48d6dc4
Show message actions by default (#16289)
allozaur Sep 27, 2025
eae567b
vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16…
Acly Sep 27, 2025
71edc9d
vulkan: support arbitrary KV dimension in flash attention (#16160)
jeffbolznv Sep 27, 2025
e4e91bc
vulkan: handle mat_mul with A matrix > 4GB (#16176)
jeffbolznv Sep 28, 2025
dc861d2
metal : fuse non-sequential nodes (#16102)
ggerganov Sep 28, 2025
967c966
metal : extend mat-mat multiplication support (#16225)
ggerganov Sep 28, 2025
70042d5
vulkan: 64-bit im2col (#16135)
jeffbolznv Sep 28, 2025
f21a0aa
Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] …
ImadSaddik Sep 28, 2025
8e80a01
devops: switch to using ubuntu-22.04-s390x image (#16302)
taronaeo Sep 28, 2025
db49b1c
ci : fix musa docker build (#16306)
yeahdongcn Sep 28, 2025
8a7500b
common : fix reasoning before forced tool call via tool_choice = requ…
crat0z Sep 28, 2025
94ee2c0
ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307)
CISC Sep 28, 2025
a62b221
vulkan: Fix validation failure in quantized flash attention (#16292)
jeffbolznv Sep 29, 2025
97905b2
ggml : fix dependencies for ggml_set_rows (#16318)
ggerganov Sep 29, 2025
e688dc3
perplexity : show more kl-divergence data (#16321)
ddh0 Sep 29, 2025
d751670
llama-cli: prevent spurious assistant token (#16202)
vinkal-chudgar Sep 29, 2025
1a60170
fix: preserved zero values in chat settings inputs and textareas by s…
ServeurpersoCom Sep 29, 2025
2ca531e
Improve Mobile UI for dialogs and action dropdowns (#16222)
allozaur Sep 29, 2025
0fec081
ggml : check cuda and metal argsort limits and add test (#16323)
CISC Sep 29, 2025
3a6e259
ggml-backend : add root cause in error message if loading backend lib…
rlewczuk Sep 29, 2025
990758b
ggml : bump version to 0.9.1
ggerganov Sep 20, 2025
e5e210e
ggml : prepare for development of 0.9.2-dev
ggerganov Sep 20, 2025
0c4c806
ggml : bump version to 0.9.3 (ggml/1353)
danbev Sep 25, 2025
0996bd5
ggml : remove -dev suffix from release version (ggml/1355)
danbev Sep 26, 2025
a494252
sync : whisper.cpp (ggml/1359)
ggerganov Sep 29, 2025
6fa8591
sync : ggml
ggerganov Sep 29, 2025
de31425
ggml: riscv: add riscv spacemit backend (#15288)
alex-spacemit Sep 29, 2025
a2af773
ci : add AMD runners and workflows (#16249)
ggerganov Sep 29, 2025
7e6bba9
Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` b…
ServeurpersoCom Sep 29, 2025
6d477a9
tests: override test_set_rows::max_nmse_err to allow for occasional r…
jeffbolznv Sep 30, 2025
9ca998c
codeowners: add codeowners for opencl backend (#16344)
lhez Sep 30, 2025
55c2cb0
kleidiai : fix work size and threads sync for fp16 (#16246)
chaxu01 Sep 30, 2025
b7f86d8
common : simplify etag tracking by removing json (#16342)
angt Sep 30, 2025
79ec093
metal : dynamic simdgroups for MV kernels (#16340)
ggerganov Sep 30, 2025
ce2071b
cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328)
anavp-nvidia Sep 30, 2025
e8136cb
ggml : bump version to 0.9.4 (ggml/1363)
ggerganov Sep 30, 2025
637732f
ci : disable ccache for android (#16348)
CISC Sep 30, 2025
b32dde0
common : remove common_has_curl() (#16351)
angt Sep 30, 2025
e5e46df
opencl: support ne3 in get_rows (#15866)
lhez Sep 30, 2025
b05167d
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
reeselevine Sep 30, 2025
3684b88
Chatapi ignore empty sampling (#16330)
ServeurpersoCom Sep 30, 2025
4524a19
opencl: support pad_ext (#15888)
lhez Sep 30, 2025
8f4603e
common : disable progress bar without a tty (#16352)
angt Sep 30, 2025
b0f974b
ci : fix ccache key for ubuntu-cpu-cmake (#16355)
CISC Sep 30, 2025
8cb89e1
model : support GLM 4.6 (make a few NextN/MTP tensors not required) (…
bartowski1182 Sep 30, 2025
b3061db
webui: Remove running `llama-server` within WebUI `dev.sh` script (#1…
allozaur Oct 1, 2025
51d50b9
vulkan: make ggml_vk_default_dispatcher support older vulkan headers …
netrunnereve Oct 1, 2025
0fb527c
Add optional setting for showing "Model used:" information (#16337)
allozaur Oct 1, 2025
d8356c7
ci : use registry cache for docker builds (#16366)
CISC Oct 1, 2025
e6382cc
Improve code block color theming (#16325)
allozaur Oct 1, 2025
df29570
Conversation action dialogs as singletons from Chat Sidebar + apply c…
allozaur Oct 1, 2025
ea609f5
common: introduce http.h for httplib-based client (#16373)
angt Oct 1, 2025
203e157
ci: Properly install rocwmma for hip builds (#16305)
IMbackK Oct 1, 2025
9376cb8
llama : parameter conversion and loading fixes for PLaMo2 variants (#…
mitmul Oct 1, 2025
3b6b223
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.…
IMbackK Oct 1, 2025
d5062cf
CI: reenable cdna in rocm docker builds (#16376)
IMbackK Oct 1, 2025
3d70842
HIP: add IMbackK to codeowner (#16375)
IMbackK Oct 2, 2025
c400017
SYCL: Update to oneAPI 2025.2 (#16371)
NeoZhangJianyu Oct 2, 2025
a3066ee
ci : fix clean-up of old logs (#16381)
ggerganov Oct 2, 2025
d683cbb
ci: update vulkan ci (#16294)
netrunnereve Oct 2, 2025
9beeea2
ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388)
CISC Oct 2, 2025
8a1a8b4
musa: update compile flags (#16265)
yeahdongcn Oct 2, 2025
2401dea
model : Apertus model implementation (#15852)
pwilkin Oct 2, 2025
ac81e42
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
reeselevine Oct 2, 2025
1bdff01
test-barrier : do not use more threads than physically available (#16…
CISC Oct 2, 2025
9cf2466
fix: track viewportHeight via window.innerHeight to avoid unwanted sc…
ServeurpersoCom Oct 3, 2025
29e391e
webui : Fix messages payload sent to chat completions (#16402)
allozaur Oct 3, 2025
eb76a30
vulkan: in flash attention, bounds check against nem1 (don't rely on …
jeffbolznv Oct 3, 2025
74704df
Capture model name only after first token (streaming) or completed re…
allozaur Oct 3, 2025
9b3adcf
ci : change macos-13 to macos-15-intel (#16401)
danbev Oct 3, 2025
1e05b4e
vulkan: Fix FA coopmat1 invalid array indexing (#16365)
jeffbolznv Oct 3, 2025
3ef1cce
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (#1…
jeffbolznv Oct 3, 2025
bdec5a2
Fix missing messages on sibling navigation (#16408)
allozaur Oct 3, 2025
5acd3e8
ggml : fix graph reallocation with multiple chunks (#16396)
Acly Oct 3, 2025
9e84fbf
llama : fix shapes for bert/mpt q/k norm (#16409)
CISC Oct 3, 2025
475678f
metal : fix loop bound in ggml_mem_ranges (#16412)
ggerganov Oct 3, 2025
5eff7c1
server : context checkpointing for hybrid and recurrent models (#16382)
ddh0 Oct 3, 2025
48cf3db
chat : support Magistral thinking (#16413)
ServeurpersoCom Oct 3, 2025
f78c8d8
vulkan : incremental shader builds (#16341)
Acly Oct 4, 2025
c12d919
rpc : add support for multiple devices (#16276)
rgerganov Oct 4, 2025
33ee9f7
rpc : check src buffer when copying tensor (#16421)
rgerganov Oct 4, 2025
a4d4236
vulkan: use a more appropriate amount of threads when generating shad…
netrunnereve Oct 4, 2025
db10b7a
ggml webgpu: actually add softmax, fix rms_norm offset (#16400)
reeselevine Oct 5, 2025
94fb727
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)
gabe-l-hart Oct 5, 2025
0e54749
server: update readme to mention n_past_max metric (#16436)
okuvshynov Oct 6, 2025
04c340f
nix : removed metal for nix (#16118)
yuannan Oct 6, 2025
c3d2fdd
ggml-cpu : fix leftover handling in ggml_vec_scale_f32 for SVE (#16443)
danbev Oct 6, 2025
74321e2
ci : remove missing reranker model files (#16444)
danbev Oct 6, 2025
1244ada
ggml : fix unaligned access in AMX code (#16315)
ggerganov Oct 6, 2025
2f6fd3e
ci : refactor sdk caching to minimize storage (#16414)
CISC Oct 6, 2025
6914cd2
chat : Granite Docling stopping (#16438)
gabe-l-hart Oct 6, 2025
bae34f0
llama : add --no-host to disable host buffers (#16310)
Gadflyii Oct 6, 2025
96c4732
metal : various optimizations + refactoring (#16446)
ggerganov Oct 7, 2025
a59838d
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
ggerganov Oct 7, 2025
c81638c
metal : add support for non-padded FA KV (#16148)
ggerganov Oct 7, 2025
c8b914f
memory : use sequential equal splits for recurrent modules (#16442)
ggerganov Oct 7, 2025
ac6274e
rpc : update documentation (#16441)
rgerganov Oct 7, 2025
c95d3be
presets : fix pooling param for embedding models (#16455)
ggerganov Oct 7, 2025
3c5291c
webui : added download action (#13552) (#16282)
srogmann Oct 7, 2025
542bee8
server : add `/v1/health` endpoint (#16461)
ggerganov Oct 7, 2025
ff4bf58
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
tdakhran Oct 7, 2025
d1ff4d4
ggml webgpu: profiling, CI updates, reworking of command submission (…
reeselevine Oct 7, 2025
c20f70b
server : improve context checkpoint logic (#16440)
ggerganov Oct 8, 2025
5ceda55
metal : mark FA blocks (#16372)
ggerganov Oct 8, 2025
59dee5d
server : fix cancel pending task (#16467)
issixx Oct 8, 2025
30115cf
Disable CUDA host buffers on integrated GPUs (#16308)
ai-fonsi Oct 8, 2025
2ed098f
refactor: centralize CoT parsing in backend for streaming mode (#16394)
ServeurpersoCom Oct 8, 2025
2d3be68
model: EmbeddingGemma Adding Support for SentenceTransformers Dense M…
sfallah Oct 9, 2025
d3e2ecc
[SYCL] refactor soft_max, add soft_max_back (#16472)
NeoZhangJianyu Oct 9, 2025
fc96244
kleidiai: kernel interface refactoring (#16460)
chaxu01 Oct 9, 2025
e7f4508
CANN: Improve ACL graph matching (#16166)
noemotiovon Oct 9, 2025
79a0378
ci: add ARM64 Kleidiai build and test support (#16462)
sudhiarm Oct 9, 2025
34b1ad3
model-conversion : add support for SentenceTransformers (#16387)
danbev Oct 9, 2025
84b7e4f
No markdown in cot (#16483)
ServeurpersoCom Oct 9, 2025
7a35736
server : host-memory prompt caching (#16391)
ggerganov Oct 9, 2025
be62614
cpu : optimize the ggml NORM operation (#15953)
duduta Oct 9, 2025
61bb80d
webui: updated the chat service to only include max_tokens in the req…
ServeurpersoCom Oct 9, 2025
5e1b18f
cmake : Dont define XOPENSOURCE on AIX (#16481)
mehendarkarprajwal Oct 10, 2025
a70c2f3
server : log requests to /v1/completions (#16495)
rgerganov Oct 10, 2025
055dafc
server : return HTTP 400 if prompt exceeds context length (#16486)
rgerganov Oct 10, 2025
c862b0e
vocab : mark EOT token for Granite models (#16499)
ggerganov Oct 10, 2025
ebf89db
server : fix division by zero when reporting stats (#16501)
ggerganov Oct 10, 2025
c3e195f
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
amirai21 Oct 11, 2025
95d06c2
cuda : avoid initializing unused devices (#16510)
slaren Oct 11, 2025
08ddcdd
server / ranking : add sorting and management of top_n (#16403)
YannFollet Oct 11, 2025
fc7de03
feat: render user content as markdown option (#16358)
ServeurpersoCom Oct 11, 2025
c7a17b8
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
ggerganov Oct 11, 2025
fe6f07c
CUDA: faster tile FA, add oob checks, more HSs (#16492)
JohannesGaessler Oct 11, 2025
535afa5
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
sirus20x6 Oct 12, 2025
42d3bfd
hparams : add check for layer index in is_recurrent (#16511)
danbev Oct 12, 2025
4a25717
ggml : Fix FP16 ELU positive branch (#16519)
sirus20x6 Oct 12, 2025
a2d0199
common : update presets (#16504)
ggerganov Oct 12, 2025
1718cfa
common : handle unicode during partial json parsing (#16526)
aldehir Oct 12, 2025
b37105d
ci : add Vulkan on Ubuntu with default packages build (#16532)
mbaudier Oct 12, 2025
bfb4912
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521)
NeoZhangJianyu Oct 12, 2025
ae71fc0
webui: remove client-side context pre-check and rely on backend for l…
ServeurpersoCom Oct 12, 2025
460b03d
metal : add opt_step_adamw and op_sum (#16529)
cern1710 Oct 12, 2025
1ebf5b7
CANN: Update several operators to support FP16 data format (#16251)
hipudding Oct 13, 2025
324337d
ggml : fix scalar path for computing norm (#16558)
ggerganov Oct 13, 2025
f21f48a
metal: add support for opt_step_sgd (#16539)
cern1710 Oct 13, 2025
1702939
fix: add remark plugin to render raw HTML as literal text (#16505)
ServeurpersoCom Oct 13, 2025
c9eb359
CANN: fix CPU memory leak in CANN backend (#16549)
noemotiovon Oct 13, 2025
1a0d879
ggml : fix build broken with -march=armv9-a on MacOS (#16520)
DamonFool Oct 13, 2025
df83700
CUDA: fix numerical issues in tile FA kernel (#16540)
JohannesGaessler Oct 13, 2025
9db0ead
opencl: fix build targeting CL 2 (#16554)
lhez Oct 13, 2025
96f9e03
graph : support cacheless embeddings with FA and iSWA (#16528)
ggerganov Oct 13, 2025
5643f23
metal : FA support F32 K and V and head size = 32 (#16531)
ggerganov Oct 13, 2025
5a6bf4e
server : dynamic token limit for prompt cache (#16560)
ggerganov Oct 14, 2025
6970d1f
cuda : remove legacy copy-op pointer indirection code (#16485)
anavp-nvidia Oct 14, 2025
9c5ed34
CUDA: add fp kernel for larger batch size MoE (#16512)
am17an Oct 14, 2025
bd28c71
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)
am17an Oct 14, 2025
8cd1850
CUDA: enable FA for FP32 KV cache (#16546)
JohannesGaessler Oct 14, 2025
9f9ee43
vulkan: Improve build time for MSVC (#16545)
jeffbolznv Oct 14, 2025
ac97d9b
vulkan: Support FA with K/V in F32 (#16543)
jeffbolznv Oct 14, 2025
cabf6ff
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …
am17an Oct 14, 2025
2130ea8
vulkan: Add ACC_TYPE_VEC2 implementation (#16203)
SavicStefan Oct 14, 2025
db630d5
metal : avoid using Metal's gpuAddress property (#16576)
ggerganov Oct 14, 2025
1c709d1
server : fix mtmd checkpoints (#16591)
ggerganov Oct 15, 2025
54bb97e
CUDA: Changing the CUDA scheduling strategy to spin (#16585)
JTischbein Oct 15, 2025
45c40ee
llama-quant: add support for mmproj (#16592)
ngxson Oct 15, 2025
5248535
server : fix img token logs (#16595)
ggerganov Oct 15, 2025
0ed6745
metal: optimise `GGML_OP_SUM` (#16559)
cern1710 Oct 15, 2025
f0039fe
Add server-driven parameter defaults and syncing (#16515)
allozaur Oct 15, 2025
2b6a3c3
opencl: fix FA for f32 (#16584)
lhez Oct 15, 2025
7e18f24
opencl: add q8_0 mm support (#16469)
lhez Oct 15, 2025
95427cb
cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083)
safranowith Oct 15, 2025
b99f8c2
gguf-py : add support for endian conversion of BF16 data (#16594)
AlekseiNikiforovIBM Oct 15, 2025
393b7c6
SYCL: Add GGML_OP_MEAN operator support (#16009)
yael-works Oct 16, 2025
9990d81
ggml-cpu: replace putenv with setenv for const-correctness (#16573)
otegami Oct 16, 2025
8818d36
common : Update the docs on -t --threads (#16236)
takasurazeem Oct 16, 2025
b2e8671
CANN: format code using .clang-format (#15863)
noemotiovon Oct 16, 2025
726405c
sycl : add ARANGE operator (#16362)
GittyBurstein Oct 16, 2025
29750a9
fix: added a normalization step for MathJax-style \[\] and \(\) delim…
ServeurpersoCom Oct 16, 2025
588824c
mtmd : support home-cooked Mistral Small Omni (#14928)
ngxson Oct 16, 2025
a68daf0
SYCL SET operator optimized for F32 tensors (#16350)
GittyBurstein Oct 17, 2025
b590ae4
grammar : use int64_t to avoid int overflows in int schema to grammar…
ochafik Oct 17, 2025
aa9e70a
metal : add `CONV_TRANSPOSE_2D` (#16542)
iliailmer Oct 17, 2025
f9efc08
vulkan: fix debug build (add_rms_len/data not found) (#16624)
jeffbolznv Oct 17, 2025
01de0fa
webui: reorganize settings layout (#16607)
ServeurpersoCom Oct 17, 2025
f51e2f7
ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)
muggle-stack Oct 17, 2025
557ca9b
vulkan: Add State Space Model (SSM) Operations Support (#16463)
giuseppe Oct 17, 2025
0a52f5a
rpc : report actual free memory (#16616)
rgerganov Oct 17, 2025
6a04bac
llama-model: fix insonsistent ctxs <-> bufs order (#16581)
JohannesGaessler Oct 17, 2025
586ef08
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
shawngu-quic Oct 18, 2025
ac040c3
CUDA: use registers instead of smem in topk-moe (#16647)
am17an Oct 18, 2025
a66d9cb
vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)
jeffbolznv Oct 18, 2025
718de8b
HIP: fix GPU_TARGETS (#16642)
JohannesGaessler Oct 18, 2025
20fe2b7
CODEOWNERS: update for ggml-cuda/mmf (#16660)
am17an Oct 19, 2025
f146809
ci: include s390x release binaries (#16648)
taronaeo Oct 19, 2025
0023d90
ci : avoid manual updates of docs/ops.md (#16663)
CISC Oct 19, 2025
0aa82c5
ci : fix binaries release failure for s390x (binaries may not work ye…
taronaeo Oct 19, 2025
6b38a72
model : add Granite Hybrid types (#16635)
giuseppe Oct 19, 2025
e97a19b
llama-context: only warn on pooling_type when user specified (#16674)
otegami Oct 20, 2025
4c75900
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16…
safranowith Oct 20, 2025
595453c
readme: update bindings (#16651)
deadprogram Oct 20, 2025
d82537b
llama-batch: fix build fails with `-Werror=missing-braces` (#16614)
otegami Oct 20, 2025
5290956
Enable per-conversation loading states to allow having parallel conve…
allozaur Oct 20, 2025
e4a83f0
Import/Export UX improvements (#16619)
allozaur Oct 20, 2025
1d4e654
Prevent premature submission on IME input (#16673)
allozaur Oct 20, 2025
b5d1a17
ggml-alloc : fix leak when reusing a tensor with a larger size (#16679)
slaren Oct 20, 2025
e2aad4c
Handle legacy 'context' attachments (#16687)
allozaur Oct 20, 2025
2b33ba9
model : add BailingMoeV2 support (#16063)
CISC Oct 20, 2025
c50d04c
sycl : add PAD_REFLECT_D1 operator support (#16145)
ye-NX Oct 20, 2025
738c1e8
vulkan: Handle FA with all -inf mask values (#16447)
jeffbolznv Oct 21, 2025
7ad39a3
opencl: fix warnings and clean up profiling (#16688)
lhez Oct 21, 2025
52d7866
ggml: add ggml_can_fuse_subgraph (#16662)
am17an Oct 21, 2025
5f157a9
CUDA: better error for FA kernel with 0 occupancy (#16643)
JohannesGaessler Oct 21, 2025
8c01a63
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
am17an Oct 21, 2025
718c892
CUDA: fix bug in topk-moe softmax (#16711)
am17an Oct 22, 2025
c7edfc2
tests : fix test-thread-safety when compiling with multiple backends …
Acly Oct 22, 2025
f36d31a
ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…
sirus20x6 Oct 22, 2025
6819ea7
webui: introduce OpenAI-compatible model selector in JSON payload (#1…
ServeurpersoCom Oct 22, 2025
8106777
Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectoriz…
slaren Oct 22, 2025
0e04b43
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
max-krasnyansky Oct 22, 2025
443c17e
sycl: use async memory allocation to fix crashes during graph recordi…
mmichel11 Oct 23, 2025
3a95385
server : send partial stop string when <EOG> is reached (#15007)
matteoserva Oct 23, 2025
0f4fedf
ggml-cuda: use passed ops instead of hardcoded ops (#16712)
am17an Oct 23, 2025
1268cf7
Manually link -lbsd to resolve flock symbol on AIX (#16610)
mehendarkarprajwal Oct 23, 2025
bcb8ed4
mtmd-cli : allow using --jinja (#16718)
ngxson Oct 23, 2025
3049040
convert : Make mistral-common dependency optional (#16738)
juliendenize Oct 23, 2025
29de86b
Cleanup & remove debugging stuff
pwilkin Oct 23, 2025
fbe0e22
Cleanup more debug stuff and flake / editorconfig errors
pwilkin Oct 23, 2025
1aed3d7
Merge remote-tracking branch 'origin/master' into qwen3_next
pwilkin Oct 23, 2025
729ebf8
Unified Delta.net
pwilkin Oct 23, 2025
f2c8be1
Merge remote-tracking branch 'origin/master' into qwen3_next
pwilkin Oct 31, 2025
8edcc4d
Update ggml/include/ggml.h [no ci]
pwilkin Oct 31, 2025
4fbd224
Update ggml/include/ggml.h [no ci]
pwilkin Oct 31, 2025
8cfd6d9
Update ggml/include/ggml.h [no ci]
pwilkin Oct 31, 2025
2825263
Update ggml/src/ggml.c [no ci]
pwilkin Oct 31, 2025
b9a6294
Update ggml/src/ggml.c [no ci]
pwilkin Oct 31, 2025
5112380
Update ggml/include/ggml.h [no ci]
pwilkin Oct 31, 2025
61667c3
Restore comment
pwilkin Oct 31, 2025
5ecbe6e
Update ggml/src/ggml-cpu/ops.cpp
pwilkin Nov 5, 2025
f63e270
Apply graph reduction changes
pwilkin Nov 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4043,6 +4043,36 @@ def set_vocab(self):
super().set_vocab()


@ModelBase.register("Qwen3NextForCausalLM")
class Qwen3NextModel(Qwen3MoeModel):
model_arch = gguf.MODEL_ARCH.QWEN3NEXT

def set_gguf_parameters(self):
super().set_gguf_parameters()
self.gguf_writer.add_ssm_conv_kernel(self.find_hparam(["linear_conv_kernel_dim"]))
self.gguf_writer.add_ssm_state_size(self.find_hparam(["linear_key_head_dim"]))
self.gguf_writer.add_ssm_group_count(self.find_hparam(["linear_num_key_heads"]))
self.gguf_writer.add_ssm_time_step_rank(self.find_hparam(["linear_num_value_heads"]))
self.gguf_writer.add_ssm_inner_size(self.find_hparam(['linear_value_head_dim']) * self.find_hparam(['linear_num_value_heads']))
if (rope_dim := self.hparams.get("head_dim")) is None:
rope_dim = self.hparams["hidden_size"] // self.hparams["num_attention_heads"]
self.gguf_writer.add_rope_dimension_count(int(rope_dim * self.hparams.get("partial_rotary_factor", 0.25)))

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
if name.startswith("mtp"):
return [] # ignore MTP layers for now
if name.endswith(".A_log"):
data_torch = -torch.exp(data_torch)
elif name.endswith(".dt_bias"):
name = name.rpartition(".dt_bias")[0] + ".dt_proj.bias"
elif "conv1d" in name:
data_torch = data_torch.squeeze()
elif name.endswith("norm.weight") and not name.endswith("linear_attn.norm.weight"):
data_torch = data_torch + 1

yield from Qwen2MoeModel.modify_tensors(self, data_torch, name, bid)


@ModelBase.register("Qwen3VLForConditionalGeneration", "Qwen3VLMoeForConditionalGeneration")
class Qwen3VLVisionModel(MmprojModel):
def __init__(self, *args, **kwargs):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ set -e

# First try command line argument, then environment variable, then file
CONVERTED_MODEL="${1:-"$CONVERTED_MODEL"}"
MODEL_TESTING_PROMPT="${2:-"$MODEL_TESTING_PROMPT"}"

if [ -z "$MODEL_TESTING_PROMPT"]; then
MODEL_TESTING_PROMPT="Hello, my name is"
fi

# Final check if we have a model path
if [ -z "$CONVERTED_MODEL" ]; then
Expand All @@ -14,7 +19,8 @@ if [ -z "$CONVERTED_MODEL" ]; then
fi

echo $CONVERTED_MODEL
echo $MODEL_TESTING_PROMPT

cmake --build ../../build --target llama-logits -j8

../../build/bin/llama-logits -m "$CONVERTED_MODEL" "Hello, my name is"
../../build/bin/llama-logits -m "$CONVERTED_MODEL" "$MODEL_TESTING_PROMPT"
8 changes: 6 additions & 2 deletions examples/model-conversion/scripts/causal/run-org-model.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,12 @@ def fn(_m, input, output):
# of using AutoModelForCausalLM.
print(f"Model class: {model.__class__.__name__}")

prompt = "Hello, my name is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
device = next(model.parameters()).device
if os.getenv("MODEL_TESTING_PROMPT"):
prompt = os.getenv("MODEL_TESTING_PROMPT")
else:
prompt = "Hello, my name is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

print(f"Input tokens: {input_ids}")
print(f"Input text: {repr(prompt)}")
Expand Down
48 changes: 48 additions & 0 deletions ggml/include/ggml.h
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,7 @@ extern "C" {
GGML_OP_COS,
GGML_OP_SUM,
GGML_OP_SUM_ROWS,
GGML_OP_CUMSUM,
GGML_OP_MEAN,
GGML_OP_ARGMAX,
GGML_OP_COUNT_EQUAL,
Expand Down Expand Up @@ -530,6 +531,7 @@ extern "C" {
GGML_OP_TIMESTEP_EMBEDDING,
GGML_OP_ARGSORT,
GGML_OP_LEAKY_RELU,
GGML_OP_TRI,

GGML_OP_FLASH_ATTN_EXT,
GGML_OP_FLASH_ATTN_BACK,
Expand All @@ -542,6 +544,7 @@ extern "C" {
GGML_OP_RWKV_WKV6,
GGML_OP_GATED_LINEAR_ATTN,
GGML_OP_RWKV_WKV7,
GGML_OP_SOLVE_TRI,

GGML_OP_UNARY,

Expand Down Expand Up @@ -576,6 +579,8 @@ extern "C" {
GGML_UNARY_OP_HARDSWISH,
GGML_UNARY_OP_HARDSIGMOID,
GGML_UNARY_OP_EXP,
GGML_UNARY_OP_EXPM1,
GGML_UNARY_OP_SOFTPLUS,
GGML_UNARY_OP_GELU_ERF,
GGML_UNARY_OP_XIELU,
GGML_UNARY_OP_FLOOR,
Expand Down Expand Up @@ -620,6 +625,13 @@ extern "C" {
GGML_TENSOR_FLAG_LOSS = 8, // ...defines loss for numerical optimization (multiple loss tensors add up)
};

enum ggml_tri_type {
GGML_TRI_TYPE_UPPER_DIAG = 0,
GGML_TRI_TYPE_UPPER = 1,
GGML_TRI_TYPE_LOWER_DIAG = 2,
GGML_TRI_TYPE_LOWER = 3
};

struct ggml_init_params {
// memory pool
size_t mem_size; // bytes
Expand Down Expand Up @@ -957,6 +969,22 @@ extern "C" {
struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_expm1(
struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_expm1_inplace(
struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_softplus(
struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_softplus_inplace(
struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_sin(
struct ggml_context * ctx,
struct ggml_tensor * a);
Expand All @@ -983,6 +1011,10 @@ extern "C" {
struct ggml_context * ctx,
struct ggml_tensor * a);

GGML_API struct ggml_tensor * ggml_cumsum(
struct ggml_context * ctx,
struct ggml_tensor * a);

// mean along rows
GGML_API struct ggml_tensor * ggml_mean(
struct ggml_context * ctx,
Expand Down Expand Up @@ -2186,6 +2218,17 @@ extern "C" {
int shift2,
int shift3);

// Make matrix into a triangular one (upper, upper + diagonal, lower or lower + diagonal) with constant value
GGML_API struct ggml_tensor * ggml_tri(
struct ggml_context * ctx,
struct ggml_tensor * a,
float constant,
enum ggml_tri_type tritype);

GGML_API struct ggml_tensor * ggml_tri_keep(
struct ggml_context * ctx,
struct ggml_tensor * a,
enum ggml_tri_type tritype);

// Ref: https://github.com/CompVis/stable-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L151
// timesteps: [N,]
Expand Down Expand Up @@ -2355,6 +2398,11 @@ extern "C" {
struct ggml_tensor * b,
struct ggml_tensor * state);

GGML_API struct ggml_tensor * ggml_solve_tri(
struct ggml_context * ctx,
struct ggml_tensor * a,
struct ggml_tensor * x);

// custom operators

typedef void (*ggml_custom1_op_t)(struct ggml_tensor * dst , const struct ggml_tensor * a, int ith, int nth, void * userdata);
Expand Down
17 changes: 17 additions & 0 deletions ggml/src/ggml-cpu/ggml-cpu.c
Original file line number Diff line number Diff line change
Expand Up @@ -1731,6 +1731,10 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm
{
ggml_compute_forward_sum_rows(params, tensor);
} break;
case GGML_OP_CUMSUM:
{
ggml_compute_forward_cumsum(params, tensor);
} break;
case GGML_OP_MEAN:
{
ggml_compute_forward_mean(params, tensor);
Expand Down Expand Up @@ -1943,6 +1947,10 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm
{
ggml_compute_forward_leaky_relu(params, tensor);
} break;
case GGML_OP_TRI:
{
ggml_compute_forward_tri(params, tensor);
} break;
case GGML_OP_FLASH_ATTN_EXT:
{
ggml_compute_forward_flash_attn_ext(params, tensor);
Expand Down Expand Up @@ -1998,6 +2006,10 @@ static void ggml_compute_forward(struct ggml_compute_params * params, struct ggm
{
ggml_compute_forward_rwkv_wkv7(params, tensor);
} break;
case GGML_OP_SOLVE_TRI:
{
ggml_compute_forward_solve_tri(params, tensor);
} break;
case GGML_OP_MAP_CUSTOM1:
{
ggml_compute_forward_map_custom1(params, tensor);
Expand Down Expand Up @@ -2153,10 +2165,13 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) {
case GGML_OP_SUM_ROWS:
case GGML_OP_MEAN:
case GGML_OP_ARGMAX:
case GGML_OP_CUMSUM:
case GGML_OP_TRI:
{
n_tasks = 1;
} break;
case GGML_OP_COUNT_EQUAL:
case GGML_OP_SOLVE_TRI:
{
n_tasks = n_threads;
} break;
Expand All @@ -2179,6 +2194,8 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) {
case GGML_UNARY_OP_HARDSWISH:
case GGML_UNARY_OP_HARDSIGMOID:
case GGML_UNARY_OP_EXP:
case GGML_UNARY_OP_SOFTPLUS:
case GGML_UNARY_OP_EXPM1:
case GGML_UNARY_OP_FLOOR:
case GGML_UNARY_OP_CEIL:
case GGML_UNARY_OP_ROUND:
Expand Down
Loading