You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user-guide/concepts/model-providers/litellm.md
+41Lines changed: 41 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,6 +94,47 @@ If you encounter the error `ModuleNotFoundError: No module named 'litellm'`, thi
94
94
95
95
## Advanced Features
96
96
97
+
### Caching
98
+
99
+
LiteLLM supports provider-agnostic caching through SystemContentBlock arrays, allowing you to define cache points that work across all supported model providers. This enables you to reuse parts of previous requests, which can significantly reduce token usage and latency.
100
+
101
+
#### System Prompt Caching
102
+
103
+
Use SystemContentBlock arrays to define cache points in your system prompts:
104
+
105
+
```python
106
+
from strands import Agent
107
+
from strands.models.litellm import LiteLLMModel
108
+
from strands.types.content import SystemContentBlock
109
+
110
+
# Define system content with cache points
111
+
system_content = [
112
+
SystemContentBlock(
113
+
text="You are a helpful assistant that provides concise answers. "
114
+
"This is a long system prompt with detailed instructions..."
# Cache metrics like cacheWriteInputTokens will be present in response1.metrics.accumulated_usage
130
+
131
+
# Second request will reuse the cached system prompt
132
+
response2 = agent("Tell me about JavaScript")
133
+
# Cache metrics like cacheReadInputTokens will be present in response2.metrics.accumulated_usage
134
+
```
135
+
136
+
> **Note**: Caching availability and behavior depends on the underlying model provider accessed through LiteLLM. Some providers may have minimum token requirements or other limitations for cache creation.
137
+
97
138
### Structured Output
98
139
99
140
LiteLLM supports structured output by proxying requests to underlying model providers that support tool calling. The availability of structured output depends on the specific model and provider you're using through LiteLLM.
0 commit comments