diff --git a/examples/serve/compatibility/README.md b/examples/serve/compatibility/README.md
new file mode 100644
index 00000000000..c21ec79afe5
--- /dev/null
+++ b/examples/serve/compatibility/README.md
@@ -0,0 +1,75 @@
+# OpenAI API Compatibility Examples
+
+This directory contains individual, self-contained examples demonstrating TensorRT-LLM's OpenAI API compatibility. Examples are organized by API endpoint.
+
+## Prerequisites
+
+1. **Start the trtllm-serve server:**
+   ```bash
+   trtllm-serve meta-llama/Llama-3.1-8B-Instruct
+   ```
+
+   for reasoning model or model with tool calling ability. Specify `--tool_parser` and `--reasoning_parser`, e.g.
+
+   ```bash
+   trtllm-serve Qwen/Qwen3-8B --reasoning_parser "qwen3" --tool_parser "qwen3"
+   ```
+
+
+## Running Examples
+
+Each example is a standalone Python script. Run from the example's directory:
+
+```bash
+# From chat_completions directory
+cd chat_completions
+python example_01_basic_chat.py
+```
+
+Or run with full path from the repository root:
+
+```bash
+python examples/serve/compatibility/chat_completions/example_01_basic_chat.py
+```
+
+### 📋 Complete Example List
+
+All examples demonstrate the `/v1/chat/completions` endpoint:
+
+| Example | File | Description |
+|---------|------|-------------|
+| **01** | `example_01_basic_chat.py` | Basic non-streaming chat completion |
+| **02** | `example_02_streaming_chat.py` | Streaming responses with real-time delivery |
+| **03** | `example_03_multi_turn_conversation.py` | Multi-turn conversation with context |
+| **04** | `example_04_streaming_with_usage.py` | Streaming with continuous token usage stats |
+| **05** | `example_05_json_mode.py` | Structured output with JSON schema |
+| **06** | `example_06_tool_calling.py` | Function/tool calling with tools |
+| **07** | `example_07_advanced_sampling.py` | TensorRT-LLM extended sampling parameters |
+
+## Configuration
+
+All examples use these default settings:
+
+```python
+base_url = "http://localhost:8000/v1"
+api_key = "tensorrt_llm"  # Can be any string
+```
+
+To use a different server:
+
+```python
+client = OpenAI(
+    base_url="http://YOUR_SERVER:PORT/v1",
+    api_key="your_key",
+)
+```
+
+## Model Requirements
+
+Some examples require specific model capabilities:
+
+| Example | Model Requirement |
+|---------|------------------|
+| 05 (JSON Mode) | xgrammar support |
+| 06 (Tool Calling) | Tool-capable model (Llama 3.1+, Mistral Instruct, etc.) |
+| Others | Any model |
diff --git a/examples/serve/compatibility/chat_completions/README.md b/examples/serve/compatibility/chat_completions/README.md
new file mode 100644
index 00000000000..58695dceca3
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/README.md
@@ -0,0 +1,100 @@
+# Chat Completions API Examples
+
+Examples for the `/v1/chat/completions` endpoint - the most versatile API for conversational AI.
+
+## Quick Start
+
+```bash
+# Run the basic example
+python example_01_basic_chat.py
+```
+
+## Examples Overview
+
+### Basic Examples
+
+1. **`example_01_basic_chat.py`** - Start here!
+   - Simple request/response
+   - Shows token usage
+   - Non-streaming mode
+
+2. **`example_02_streaming_chat.py`** - Real-time responses
+   - Stream tokens as generated
+   - Better UX for long responses
+   - Server-Sent Events (SSE)
+
+3. **`example_03_multi_turn_conversation.py`** - Context management
+   - Multiple conversation turns
+   - Conversation history
+   - Follow-up questions
+
+4. **`example_04_streaming_with_usage.py`** - Streaming + metrics
+   - Continuous token counts
+   - `stream_options` parameter
+   - Monitor resource usage
+
+### Advanced Examples
+
+5. **`example_05_json_mode.py`** - Structured output
+   - JSON schema validation
+   - Structured data extraction
+   - Requires xgrammar
+
+6. **`example_06_tool_calling.py`** - Function calling
+   - External tool integration
+   - Function definitions
+   - Requires compatible model (Qwen3, gpt_oss)
+
+7. **`example_07_advanced_sampling.py`** - Fine-tuned control
+   - `top_k`, `repetition_penalty`
+   - Custom stop sequences
+   - TensorRT-LLM extensions
+
+## Key Concepts
+
+### Non-Streaming vs Streaming
+
+**Non-Streaming** (`stream=False`):
+- Wait for complete response
+- Single response object
+- Simple to use
+
+**Streaming** (`stream=True`):
+- Tokens delivered as generated
+- Better perceived latency
+- Server-Sent Events (SSE)
+
+### Conversation Context
+
+Messages accumulate in the `messages` array:
+```python
+messages = [
+    {"role": "system", "content": "You are helpful."},
+    {"role": "user", "content": "Hello"},
+    {"role": "assistant", "content": "Hi there!"},
+    {"role": "user", "content": "How are you?"},  # Next turn
+]
+```
+
+### Tool Calling
+
+Define functions the model can call:
+```python
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "parameters": {...}
+    }
+}]
+```
+
+## Model Requirements
+
+| Feature | Requirement |
+|---------|-------------|
+| Basic chat | Any model |
+| Streaming | Any model |
+| Multi-turn | Any model |
+| JSON mode | xgrammar support |
+| Tool calling | Compatible model (Qwen3 and gpt_oss.) |
diff --git a/examples/serve/compatibility/chat_completions/example_01_basic_chat.py b/examples/serve/compatibility/chat_completions/example_01_basic_chat.py
new file mode 100644
index 00000000000..64d3f8ab400
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_01_basic_chat.py
@@ -0,0 +1,58 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 1: Basic Non-Streaming Chat Completion.
+
+Demonstrates a simple chat completion request with the OpenAI-compatible API.
+"""
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 1: Basic Non-Streaming Chat Completion")
+print("=" * 80)
+print()
+
+# Create a simple chat completion
+response = client.chat.completions.create(
+    model=model,
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "What is the capital of France?"},
+    ],
+    max_tokens=4096,
+    temperature=0.7,
+)
+
+# Print the response
+print("Response:")
+print(f"Content: {response.choices[0].message.content}")
+print(f"Finish reason: {response.choices[0].finish_reason}")
+print(
+    f"Tokens used: {response.usage.total_tokens} "
+    f"(prompt: {response.usage.prompt_tokens}, "
+    f"completion: {response.usage.completion_tokens})"
+)
diff --git a/examples/serve/compatibility/chat_completions/example_02_streaming_chat.py b/examples/serve/compatibility/chat_completions/example_02_streaming_chat.py
new file mode 100644
index 00000000000..71343821088
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_02_streaming_chat.py
@@ -0,0 +1,76 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 2: Streaming Chat Completion.
+
+Demonstrates streaming responses with real-time token delivery.
+"""
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 2: Streaming Chat Completion")
+print("=" * 80)
+print()
+
+print("Prompt: Write a haiku about artificial intelligence\n")
+
+# Create a streaming chat completion
+stream = client.chat.completions.create(
+    model=model,
+    messages=[{"role": "user", "content": "Write a haiku about artificial intelligence"}],
+    max_tokens=4096,
+    temperature=0.8,
+    stream=True,
+)
+
+# Print tokens as they arrive
+print("Response (streaming):")
+print("Assistant: ", end="", flush=True)
+
+current_state = "none"
+for chunk in stream:
+    has_content = hasattr(chunk.choices[0].delta, "content") and chunk.choices[0].delta.content
+    has_reasoning_content = (
+        hasattr(chunk.choices[0].delta, "reasoning_content")
+        and chunk.choices[0].delta.reasoning_content
+    )
+    if has_content:
+        if current_state != "content":
+            print("Content: ", end="", flush=True)
+            current_state = "content"
+
+        print(chunk.choices[0].delta.content, end="", flush=True)
+
+    if has_reasoning_content:
+        if current_state != "reasoning_content":
+            print("Reasoning: ", end="", flush=True)
+            current_state = "reasoning_content"
+
+        print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
+print("\n")
+
+print("Stop reason: ", chunk.choices[0].finish_reason)
diff --git a/examples/serve/compatibility/chat_completions/example_03_multi_turn_conversation.py b/examples/serve/compatibility/chat_completions/example_03_multi_turn_conversation.py
new file mode 100644
index 00000000000..9edbc28faa0
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_03_multi_turn_conversation.py
@@ -0,0 +1,73 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 3: Multi-turn Conversation.
+
+Demonstrates maintaining conversation context across multiple turns.
+"""
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 3: Multi-turn Conversation")
+print("=" * 80)
+print()
+
+# Start a conversation with system message
+messages = [
+    {"role": "system", "content": "You are an expert mathematician."},
+]
+
+# First turn: User asks a question
+messages.append({"role": "user", "content": "What is 15 multiplied by 23?"})
+print("USER: What is 15 multiplied by 23?")
+
+response1 = client.chat.completions.create(
+    model=model,
+    messages=messages,
+    max_tokens=4096,
+    temperature=0,
+)
+
+assistant_reply_1 = response1.choices[0].message.content
+print(f"ASSISTANT: {assistant_reply_1}\n")
+
+# Add assistant's response to conversation history
+messages.append({"role": "assistant", "content": assistant_reply_1})
+
+# Second turn: User asks a follow-up question
+messages.append({"role": "user", "content": "Now divide that result by 5"})
+print("USER: Now divide that result by 5")
+
+response2 = client.chat.completions.create(
+    model=model,
+    messages=messages,
+    max_tokens=4096,
+    temperature=0,
+)
+
+assistant_reply_2 = response2.choices[0].message.content
+print(f"ASSISTANT: {assistant_reply_2}")
diff --git a/examples/serve/compatibility/chat_completions/example_04_streaming_with_usage.py b/examples/serve/compatibility/chat_completions/example_04_streaming_with_usage.py
new file mode 100644
index 00000000000..30dc20b7241
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_04_streaming_with_usage.py
@@ -0,0 +1,81 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 4: Streaming with Usage Statistics.
+
+Demonstrates streaming responses with continuous token usage updates.
+"""
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 4: Streaming with Usage Statistics")
+print("=" * 80)
+print()
+
+print("Request: Streaming with continuous usage stats enabled\n")
+
+# Create streaming request with usage statistics
+stream = client.chat.completions.create(
+    model=model,
+    messages=[{"role": "user", "content": "Count from 1 to 5"}],
+    max_tokens=4096,
+    stream=True,
+    stream_options={"include_usage": True, "continuous_usage_stats": True},
+)
+
+print("Response with token counts and reasoning (if available):")
+chunk = None
+current_state = "none"
+for chunk in stream:
+    if len(chunk.choices) == 0:
+        continue
+
+    has_content = hasattr(chunk.choices[0].delta, "content") and chunk.choices[0].delta.content
+    has_reasoning_content = (
+        hasattr(chunk.choices[0].delta, "reasoning_content")
+        and chunk.choices[0].delta.reasoning_content
+    )
+    if has_content:
+        if current_state != "content":
+            print("Content: ", end="", flush=True)
+            current_state = "content"
+
+        print(chunk.choices[0].delta.content, end="", flush=True)
+
+    if has_reasoning_content:
+        if current_state != "reasoning_content":
+            print("Reasoning: ", end="", flush=True)
+            current_state = "reasoning_content"
+
+        print(chunk.choices[0].delta.reasoning_content, end="", flush=True)
+print()
+
+print(
+    f"Tokens used: {chunk.usage.total_tokens} "
+    f"(prompt: {chunk.usage.prompt_tokens}, "
+    f"completion: {chunk.usage.completion_tokens})"
+)
diff --git a/examples/serve/compatibility/chat_completions/example_05_json_mode.py b/examples/serve/compatibility/chat_completions/example_05_json_mode.py
new file mode 100644
index 00000000000..6d4430e6d72
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_05_json_mode.py
@@ -0,0 +1,80 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 5: JSON Mode with Schema.
+
+Demonstrates structured output generation with JSON schema validation.
+
+Note: This requires xgrammar support and compatible model configuration.
+"""
+
+import json
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 5: JSON Mode with Schema")
+print("=" * 80)
+print()
+
+# Define the JSON schema
+schema = {
+    "name": "city_info",
+    "schema": {
+        "type": "object",
+        "properties": {
+            "name": {"type": "string"},
+            "country": {"type": "string"},
+            "population": {"type": "integer"},
+            "famous_for": {"type": "array", "items": {"type": "string"}},
+        },
+        "required": ["name", "country", "population"],
+    },
+}
+
+print("Request with JSON schema:")
+print(json.dumps(schema, indent=2))
+print()
+print("Note: JSON schema support requires xgrammar and compatible model configuration.\n")
+
+try:
+    # Create chat completion with JSON schema
+    response = client.chat.completions.create(
+        model=model,
+        messages=[
+            {"role": "system", "content": "You are a helpful assistant that outputs JSON."},
+            {"role": "user", "content": "Give me information about Tokyo."},
+        ],
+        response_format={"type": "json_schema", "json_schema": schema},
+        max_tokens=4096,
+    )
+
+    print("JSON Response:")
+    result = json.loads(response.choices[0].message.content)
+    print(json.dumps(result, indent=2))
+except Exception as e:
+    print("JSON schema support requires xgrammar and proper configuration.")
+    print(f"Error: {e}")
diff --git a/examples/serve/compatibility/chat_completions/example_06_tool_calling.py b/examples/serve/compatibility/chat_completions/example_06_tool_calling.py
new file mode 100644
index 00000000000..5f0b0289685
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_06_tool_calling.py
@@ -0,0 +1,120 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 6: Tool/Function Calling.
+
+Demonstrates tool calling with function definitions and responses.
+
+Note: This requires a compatible model (e.g., Llama 3.1+, Mistral Instruct).
+"""
+
+import json
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 6: Tool/Function Calling")
+print("=" * 80)
+print()
+print("Note: Tool calling requires compatible models (e.g., Llama 3.1+)\n")
+
+# Define the available tools
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "get_weather",
+            "description": "Get the current weather in a location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "City and state, e.g. San Francisco, CA",
+                    },
+                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                },
+                "required": ["location"],
+            },
+        },
+    }
+]
+
+
+def get_weather(location: str, unit: str = "fahrenheit") -> dict:
+    return {"location": location, "temperature": 68, "unit": unit, "conditions": "sunny"}
+
+
+print("Available tools:")
+print(json.dumps(tools, indent=2))
+print("\nUser query: What is the weather in San Francisco?\n")
+
+try:
+    # Initial request with tools
+    response = client.chat.completions.create(
+        model=model,
+        messages=[{"role": "user", "content": "What is the weather in San Francisco?"}],
+        tools=tools,
+        tool_choice="auto",
+        max_tokens=4096,
+    )
+
+    message = response.choices[0].message
+
+    if message.tool_calls:
+        print("Tool calls requested:")
+        for tool_call in message.tool_calls:
+            print(f"  Function: {tool_call.function.name}")
+            print(f"  Arguments: {tool_call.function.arguments}")
+
+        # Simulate function execution
+        print("\nSimulating function execution...")
+        function_response = get_weather(**json.loads(tool_call.function.arguments))
+        print(f"Function result: {json.dumps(function_response, indent=2)}")
+
+        # Send function result back to get final response
+        messages = [
+            {"role": "user", "content": "What is the weather in San Francisco?"},
+            message,
+            {
+                "role": "tool",
+                "tool_call_id": message.tool_calls[0].id,
+                "content": json.dumps(function_response),
+            },
+        ]
+
+        final_response = client.chat.completions.create(
+            model=model,
+            messages=messages,
+            max_tokens=4096,
+        )
+
+        print(f"\nFinal response: {final_response.choices[0].message.content}")
+    else:
+        print(f"Direct response: {message.content}")
+except Exception as e:
+    print("Note: Tool calling requires model support (e.g., Llama 3.1+ models)")
+    print(f"Error: {e}")
diff --git a/examples/serve/compatibility/chat_completions/example_07_advanced_sampling.py b/examples/serve/compatibility/chat_completions/example_07_advanced_sampling.py
new file mode 100644
index 00000000000..ae0899b3449
--- /dev/null
+++ b/examples/serve/compatibility/chat_completions/example_07_advanced_sampling.py
@@ -0,0 +1,63 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#!/usr/bin/env python3
+"""Example 7: Advanced Sampling Parameters.
+
+Demonstrates TensorRT-LLM specific sampling parameters for fine-tuned control.
+"""
+
+from openai import OpenAI
+
+# Initialize the client
+client = OpenAI(
+    base_url="http://localhost:8000/v1",
+    api_key="tensorrt_llm",
+)
+
+# Get the model name from the server
+models = client.models.list()
+model = models.data[0].id
+
+print("=" * 80)
+print("Example 7: Advanced Sampling Parameters")
+print("=" * 80)
+print()
+
+print("Using TensorRT-LLM extended parameters:")
+print("  - top_k: 50")
+print("  - repetition_penalty: 1.1")
+print("  - min_tokens: 20")
+print("  - stop sequences: ['The End', '\\n\\n\\n']")
+print()
+
+# Create completion with advanced sampling parameters
+response = client.chat.completions.create(
+    model=model,
+    messages=[{"role": "user", "content": "Write a very short story about a robot."}],
+    max_tokens=4096,
+    temperature=0.8,
+    top_p=0.95,
+    extra_body={
+        "top_k": 50,
+        "repetition_penalty": 1.1,
+        "min_tokens": 20,
+        "stop": ["The End", "\n\n\n"],
+    },
+)
+
+print("Story:")
+print(response.choices[0].message.content)
+print(f"\nFinish reason: {response.choices[0].finish_reason}")