Implement JIT and Auto-unload #97

Snuffy2 · 2025-11-22T16:45:32Z

Leaving this as draft as it depends on #96. Once #96 Is merged, I will rebase and mark it ready to merge.

This pull request introduces support for just-in-time (JIT) model loading and automatic model unloading after periods of inactivity, enhancing VRAM management and server responsiveness. It refactors the API endpoint logic to be aware of JIT and auto-unload states, improves error handling, and adds stricter type checks for handler selection. Additionally, it updates the CLI and documentation to expose and explain the new features.

JIT Loading & Auto-Unload Support

Added JIT loading and idle auto-unload features, with documentation and CLI flags (--jit, --auto-unload-minutes) to enable deferred model initialization and VRAM reclamation when idle. /health endpoint now reports model status as "unloaded" when appropriate. [1] [2] [3] [4]

API Endpoint Refactoring

Introduced _get_handler_or_error helper for consistent handler retrieval and error reporting, making endpoints aware of JIT and auto-unload states. All major endpoints now use this helper for improved reliability. [1] [2] [3] [4] [5] [6] [7] [8]

Type Safety & Error Handling

Added stricter type checks for handler selection in embeddings and audio_transcriptions endpoints, returning clear errors if the wrong model type is used. [1] [2]

Streaming Response Improvements

Improved tool call chunk indexing and ID assignment for streaming chat completions, ensuring correct association and handling of tool calls in streamed responses. [1] [2] [3] [4]

CLI & Documentation Enhancements

Refined UpperChoice class for canonical option normalization and updated related CLI help text for clarity. [1] [2]

Snuffy2 and others added 17 commits November 20, 2025 18:39

Set to ruff and and GitHub Action

acc0c31

First pass

264faba

Second pass

755d76b

Third pass

294b574

Update pyproject.toml

1d171cf

Fourth round

552aa7c

Sixth Round

766957b

Seventh round

48b8d6f

Eigth round

61dddd7

Ninth round

4d7f7bf

Tenth round

8391e53

Eleventh round

9920180

Twelfth round

8b941ca

Thirteenth Round

86d7773

Final? round

c244000

Final cleanup

d128b37

Implement JIT and Auto-Unload

0ea0081

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement JIT and Auto-unload #97

Implement JIT and Auto-unload #97

Uh oh!

Snuffy2 commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement JIT and Auto-unload #97

Are you sure you want to change the base?

Implement JIT and Auto-unload #97

Uh oh!

Conversation

Snuffy2 commented Nov 22, 2025

Leaving this as draft as it depends on #96. Once #96 Is merged, I will rebase and mark it ready to merge.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant