Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions src/guidellm/benchmark/scenarios/chat.json

This file was deleted.

11 changes: 11 additions & 0 deletions src/guidellm/benchmark/scenarios/concurrent-1k-1k-equal.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"description": "Prefill/Decode balanced scenario. Note: This scenario is optimized for NVIDIA H200s and may need to be adjusted for other hardware.",
"profile": "concurrent",
"request-type": "text_completions",
"data": {
"prompt_tokens": 1000,
"output_tokens": 1000
},
"rate": [1, 50, 100, 200, 300, 500, 650],
"max-seconds": "600"
}
11 changes: 11 additions & 0 deletions src/guidellm/benchmark/scenarios/concurrent-2ki-128-equal.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"description": "Prefill heavy scenario. Note: This scenario is optimized for NVIDIA H200s and may need to be adjusted for other hardware.",
"profile": "concurrent",
"request-type": "text_completions",
"data": {
"prompt_tokens": 2048,
"output_tokens": 128
},
"rate": [1, 50, 100, 200, 300, 500, 650],
"max-seconds": "600"
}
17 changes: 17 additions & 0 deletions src/guidellm/benchmark/scenarios/concurrent-512-2ki-norm.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"description": "Generation heavy scenario with sequence length variance. Note: This scenario is optimized for NVIDIA H200s and may need to be adjusted for other hardware.",
"profile": "concurrent",
"request-type": "text_completions",
"data": {
"prompt_tokens": 512,
"prompt_tokens_stdev": 128,
"prompt_tokens_min": 1,
"prompt_tokens_max": 1024,
"output_tokens": 2048,
"output_tokens_stdev": 512,
"output_tokens_min": 1,
"output_tokens_max": 4096
},
"rate": [1, 5, 25, 50, 100, 150, 200, 250, 300, 400, 500, 650],
"max-seconds": "600"
}
11 changes: 11 additions & 0 deletions src/guidellm/benchmark/scenarios/concurrent-8k-1k-equal.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"description": "Large context scenario. Note: This scenario is optimized for NVIDIA H200s and may need to be adjusted for other hardware.",
"profile": "concurrent",
"request-type": "text_completions",
"data": {
"prompt_tokens": 8000,
"output_tokens": 1000
},
"rate": [1, 50, 100, 200, 300, 500, 650],
"max-seconds": "600"
}
6 changes: 0 additions & 6 deletions src/guidellm/benchmark/scenarios/rag.json

This file was deleted.

Loading