Skip to content

Commit e4c8141

Browse files
authored
[faster-transformers] Fix image link (#3069)
Updated image link for Mistral cache behavior comparison.
1 parent dba28a0 commit e4c8141

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

faster-transformers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ Many recent LLMs use _sliding window_ attention, or a combination of sliding and
339339

340340
For models that only use sliding window layers, such as Mistral 7B, cache memory stops growing when the sequence reaches the window size (4096, in this case). This makes sense, because the sliding layers can't look past the previous 4K tokens anyway.
341341

342-
![mistral cache behaviour comparison](https://private-user-images.githubusercontent.com/71554963/476701186-e7fb1288-7713-4140-a2b2-1af0a723f76a.png)
342+
![mistral cache behaviour comparison](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/faster-transformers/mistral-dynamic-cache-with-config.png)
343343

344344
OpenAI gpt-oss alternates between sliding and global attention layers, which results in total KV cache memory being _halved_, as we'll see, as sequence length increases.
345345
This provides us with:

0 commit comments

Comments
 (0)