Skip to content

Commit adcb763

Browse files
feat(litellm): add docs for SystemContentBlock caching approach (#328)
1 parent 70e1ecf commit adcb763

File tree

1 file changed

+41
-0
lines changed
  • docs/user-guide/concepts/model-providers

1 file changed

+41
-0
lines changed

docs/user-guide/concepts/model-providers/litellm.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,47 @@ If you encounter the error `ModuleNotFoundError: No module named 'litellm'`, thi
9494

9595
## Advanced Features
9696

97+
### Caching
98+
99+
LiteLLM supports provider-agnostic caching through SystemContentBlock arrays, allowing you to define cache points that work across all supported model providers. This enables you to reuse parts of previous requests, which can significantly reduce token usage and latency.
100+
101+
#### System Prompt Caching
102+
103+
Use SystemContentBlock arrays to define cache points in your system prompts:
104+
105+
```python
106+
from strands import Agent
107+
from strands.models.litellm import LiteLLMModel
108+
from strands.types.content import SystemContentBlock
109+
110+
# Define system content with cache points
111+
system_content = [
112+
SystemContentBlock(
113+
text="You are a helpful assistant that provides concise answers. "
114+
"This is a long system prompt with detailed instructions..."
115+
"..." * 1000 # needs to be at least 1,024 tokens
116+
),
117+
SystemContentBlock(cachePoint={"type": "default"})
118+
]
119+
120+
# Create an agent with SystemContentBlock array
121+
model = LiteLLMModel(
122+
model_id="bedrock/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
123+
)
124+
125+
agent = Agent(model=model, system_prompt=system_content)
126+
127+
# First request will cache the system prompt
128+
response1 = agent("Tell me about Python")
129+
# Cache metrics like cacheWriteInputTokens will be present in response1.metrics.accumulated_usage
130+
131+
# Second request will reuse the cached system prompt
132+
response2 = agent("Tell me about JavaScript")
133+
# Cache metrics like cacheReadInputTokens will be present in response2.metrics.accumulated_usage
134+
```
135+
136+
> **Note**: Caching availability and behavior depends on the underlying model provider accessed through LiteLLM. Some providers may have minimum token requirements or other limitations for cache creation.
137+
97138
### Structured Output
98139

99140
LiteLLM supports structured output by proxying requests to underlying model providers that support tool calling. The availability of structured output depends on the specific model and provider you're using through LiteLLM.

0 commit comments

Comments
 (0)