diff --git a/site/content/ai-suite/graphrag/web-interface.md b/site/content/ai-suite/graphrag/web-interface.md
index 927da744a2..01d0d19f2c 100644
--- a/site/content/ai-suite/graphrag/web-interface.md
+++ b/site/content/ai-suite/graphrag/web-interface.md
@@ -178,9 +178,11 @@ See also the [Retriever](../reference/retriever.md) documentation.
 
 ## Chat with your Knowledge Graph
 
-The chat interface provides two search methods:
-- **Instant search**: Instant queries provide fast responses.
-- **Deep search**: This option will take longer to return a response.
+The Retriever service provides two search methods:
+- [Instant search](../reference/retriever.md#instant-search): Instant
+  queries provide fast responses.
+- [Deep search](../reference/retriever.md#deep-search): This option will take
+  longer to return a response.
 
 In addition to querying the Knowledge Graph, the chat service allows you to do the following:
 - Switch the search method from **Instant search** to **Deep search** and vice-versa
diff --git a/site/content/ai-suite/reference/gen-ai.md b/site/content/ai-suite/reference/gen-ai.md
index f545a7e255..af436ce37d 100644
--- a/site/content/ai-suite/reference/gen-ai.md
+++ b/site/content/ai-suite/reference/gen-ai.md
@@ -33,22 +33,15 @@ in the platform. All services support the `profiles` field, which you can use
 to define the profile to use for the service. For example, you can define a
 GPU profile that enables the service to run an LLM on GPU resources.
 
-## LLM Host Service Creation Request Body
+## Service Creation Request Body
 
-```json
-{
-    "env": {
-        "model_name": "<registered_model_name>"
-    }
-}
-```
-
-## Using Labels in Creation Request Body
+The following example shows a complete request body with all available options:
 
 ```json
 {
     "env": {
-        "model_name": "<registered_model_name>"
+        "model_name": "<registered_model_name>",
+        "profiles": "gpu,internal"
     },
     "labels": {
         "key1": "value1",
@@ -57,32 +50,120 @@ GPU profile that enables the service to run an LLM on GPU resources.
 }
 ```
 
-{{< info >}}
-Labels are optional. Labels can be used to filter and identify services in
-the Platform. If you want to use labels, define them as a key-value pair in `labels`
-within the `env` field.
-{{< /info >}}
+**Optional fields:**
+
+- **labels**: Key-value pairs used to filter and identify services in the platform.
+- **profiles**: A comma-separated string defining which profiles to use for the 
+  service (e.g., `"gpu,internal"`). If not set, the service is created with the 
+  default profile. Profiles must be present and created in the platform before 
+  they can be used.
+
+The parameters required for the deployment of each service are defined in the
+corresponding service documentation. See [Importer](importer.md)
+and [Retriever](retriever.md).
+
+## Projects
+
+Projects help you organize your GraphRAG work by grouping related services and 
+keeping your data separate. When the Importer service creates ArangoDB collections 
+(such as documents, chunks, entities, relationships, and communities), it uses 
+your project name as a prefix. For example, a project named `docs` will have 
+collections like `docs_Documents`, `docs_Chunks`, and so on.
 
-## Using Profiles in Creation Request Body
+Projects are required for the following services:
+- Importer
+- Retriever
+
+### Creating a project
+
+To create a new GraphRAG project, send a POST request to the project endpoint:
+
+```bash
+curl -X POST "https://<ExternalEndpoint>:8529/gen-ai/v1/project" \
+  -H "Authorization: Bearer <your-bearer-token>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "project_name": "docs",
+    "project_type": "graphrag",
+    "project_db_name": "documentation",
+    "project_description": "A documentation project for GraphRAG."
+  }'
+```
+
+Where:
+- **project_name** (required): Unique identifier for your project. Must be 1-63 
+  characters and contain only letters, numbers, underscores (`_`), and hyphens (`-`).
+- **project_type** (required): Type of project (e.g., `"graphrag"`).
+- **project_db_name** (required): The ArangoDB database name where the project 
+  will be created.
+- **project_description** (optional): A description of your project.
+
+Once created, you can reference your project in service deployments using the 
+`genai_project_name` field:
 
 ```json
 {
-    "env": {
-        "model_name": "<registered_model_name>",
-        "profiles": "gpu,internal"
-    }
+  "env": {
+    "genai_project_name": "docs"
+  }
 }
 ```
 
-{{< info >}}
-The `profiles` field is optional. If it is not set, the service is created with
-the default profile. Profiles must be present and created in the Platform before
-they can be used. If you want to use profiles, define them as a comma-separated
-string in `profiles` within the `env` field.
-{{< /info >}}
+### Listing projects
 
-The parameters required for the deployment of each service are defined in the
-corresponding service documentation.
+**List all project names in a database:**
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/all_project_names/<database_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+This returns only the project names for quick reference.
+
+**List all projects with full metadata in a database:**
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/all_projects/<database_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+This returns complete project objects including metadata, associated services, 
+and knowledge graph information.
+
+### Getting project details
+
+Retrieve comprehensive metadata for a specific project:
+
+```bash
+curl -X GET "https://<ExternalEndpoint>:8529/gen-ai/v1/project_by_name/<database_name>/<project_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+The response includes:
+- Project configuration
+- Associated Importer and Retriever services
+- Knowledge graph metadata
+- Service status information
+- Last modification timestamp
+
+### Deleting a project
+
+Remove a project's metadata from the GenAI service:
+
+```bash
+curl -X DELETE "https://<ExternalEndpoint>:8529/gen-ai/v1/project/<database_name>/<project_name>" \
+  -H "Authorization: Bearer <your-bearer-token>"
+```
+
+{{< warning >}}
+Deleting a project only removes the project metadata from the GenAI service. 
+It does **not** delete:
+- Services associated with the project (must be deleted separately)
+- ArangoDB collections and data
+- Knowledge graphs
+
+You must manually delete services and collections if needed.
+{{< /warning >}}
 
 ## Obtaining a Bearer Token
 
@@ -101,7 +182,7 @@ documentation.
 
 ## Complete Service lifecycle example
 
-The example below shows how to install, monitor, and uninstall the Importer service.
+The example below shows how to install, monitor, and uninstall the [Importer](importer.md) service.
 
 ### Step 1: Installing the service
 
@@ -111,15 +192,30 @@ curl -X POST https://<ExternalEndpoint>:8529/ai/v1/graphragimporter \
   -H "Content-Type: application/json" \
   -d '{
     "env": {
-      "username": "<your-username>",
       "db_name": "<your-database-name>",
-      "api_provider": "<your-api-provider>",
-      "triton_url": "<your-arangodb-llm-host-url>",
-      "triton_model": "<your-triton-model>"
+      "chat_api_provider": "<your-api-provider>",
+      "chat_api_url": "https://api.openai.com/v1",
+      "embedding_api_provider": "openai",
+      "embedding_api_url": "https://api.openai.com/v1",
+      "chat_model": "gpt-4o",
+      "embedding_model": "text-embedding-3-small",
+      "chat_api_key": "your_openai_api_key",
+      "embedding_api_key": "your_openai_api_key"
     }
   }'
 ```
 
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
+
 **Response:**
 
 ```json
@@ -176,16 +272,6 @@ curl -X DELETE https://<ExternalEndpoint>:8529/ai/v1/service/arangodb-graphrag-i
 - **Authentication**: All requests use the same Bearer token in the `Authorization` header
 {{< /info >}}
 
-### Customizing the example
-
-Replace the following values with your actual configuration:
-- `<your-username>` - Your database username.
-- `<your-database-name>` - Target database name.
-- `<your-api-provider>` - Your API provider (e.g., `triton`)
-- `<your-arangodb-llm-host-url>` - Your LLM host service URL.
-- `<your-triton-model>` - Your Triton model name (e.g., `mistral-nemo-instruct`).
-- `<your-bearer-token>` - Your authentication token.
-
 ## Service configuration
 
 The AI orchestrator service is **started by default**. 
diff --git a/site/content/ai-suite/reference/importer.md b/site/content/ai-suite/reference/importer.md
index e4cce5d200..7edb4ea50a 100644
--- a/site/content/ai-suite/reference/importer.md
+++ b/site/content/ai-suite/reference/importer.md
@@ -28,165 +28,175 @@ different concepts in your document with the Retriever service.
 You can also use the GraphRAG Importer service via the [Data Platform web interface](../graphrag/web-interface.md).
 {{< /tip >}}
 
-## Creating a new project
-
-To create a new GraphRAG project, use the `CreateProject` method by sending a
-`POST` request to the `/ai/v1/project` endpoint. You must provide a unique
-`project_name` and a `project_type` in the request body. Optionally, you can
-provide a `project_description`.
-
-```curl
-curl -X POST "https://<ExternalEndpoint>:8529/ai/v1/project" \
--H "Content-Type: application/json" \
--d '{
-  "project_name": "docs",
-  "project_type": "graphrag",
-  "project_description": "A documentation project for GraphRAG."
-}'
-```
-All the relevant ArangoDB collections (such as documents, chunks, entities,
-relationships, and communities) created during the import process will
-have the project name as a prefix. For example, the Documents collection will
-become `<project_name>_Documents`. The Knowledge Graph will also use the project
-name as a prefix. If no project name is specified, then all collections
-are prefixed with `default_project`, e.g., `default_project_Documents`.
-
-### Project metadata
+## Prerequisites
 
-Additional project metadata is accessible via the following endpoint, replacing
-`<your_project>` with the actual name of your project:
+Before importing data, you need to create a GraphRAG project. Projects help you 
+organize your work and keep your data separate from other projects.
 
-```
-GET /ai/v1/project_by_name/<your_project>
-```
+For detailed instructions on creating and managing projects, see the 
+[Projects](gen-ai.md#projects) section in the GenAI Orchestration Service 
+documentation.
 
-The endpoint provides comprehensive metadata about your project's components,
-including its importer and retriever services and their status.
+Once you have created a project, you can reference it when deploying the Importer 
+service using the `genai_project_name` field in the service configuration.
 
 ## Deployment options
 
 You can choose between two deployment options based on your needs.
 
-### Private LLM
+### Triton Inference Server
 
 If you're working in an air-gapped environment or need to keep your data
-private, you can use the private LLM mode with Triton Inference Server.
+private, you can use Triton Inference Server.
 This option allows you to run the service completely within your own
 infrastructure. The Triton Inference Server is a crucial component when
-running in private LLM mode. It serves as the backbone for running your
+running with self-hosted models. It serves as the backbone for running your
 language (LLM) and embedding models on your own machines, ensuring your
 data never leaves your infrastructure. The server handles all the complex
 model operations, from processing text to generating embeddings, and provides
 both HTTP and gRPC interfaces for communication.
 
-### Public LLM
+### OpenAI-compatible APIs
 
 Alternatively, if you prefer a simpler setup and don't have specific privacy
-requirements, you can use the public LLM mode. This option connects to cloud-based
+requirements, you can use OpenAI-compatible APIs. This option connects to cloud-based
 services like OpenAI's models via the OpenAI API or a large array of models
 (Gemini, Anthropic, publicly hosted open-source models, etc.) via the OpenRouter option.
+It also works with private corporate LLMs that expose an OpenAI-compatible endpoint.
 
 
 ## Installation and configuration
 
-The Importer service can be configured to use either:
-- Triton Inference Server (for private LLM deployments)
-- OpenAI (for public LLM deployments)
-- OpenRouter (for public LLM deployments)
+The Importer service can be configured to use either Triton Inference Server or any
+OpenAI-compatible API. OpenAI-compatible APIs work with public providers (OpenAI,
+OpenRouter, Gemini, Anthropic) as well as private corporate LLMs that expose an
+OpenAI-compatible endpoint.
 
 To start the service, use the AI service endpoint `/v1/graphragimporter`. 
 Please refer to the documentation of [AI service](gen-ai.md) for more
 information on how to use it.
 
-### Using Triton Inference Server (Private LLM)
+### Using OpenAI-compatible APIs
 
-The first step is to install the LLM Host service with the LLM and
-embedding models of your choice. The setup will the use the 
-Triton Inference Server and MLflow at the backend. 
-For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
-and [Mlflow](mlflow.md) documentation.
+The `openai` provider works with any OpenAI-compatible API, including:
+- OpenAI (official API)
+- OpenRouter
+- Google Gemini
+- Anthropic Claude
+- Corporate or self-hosted LLMs with OpenAI-compatible endpoints
 
-Once the `llmhost` service is up-and-running, then you can start the Importer
-service using the below configuration:
+set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint.
 
-```json
-{
-  "env": {
-    "username": "your_username",
-    "db_name": "your_database_name",
-    "api_provider": "triton",
-    "triton_url": "your-arangodb-llm-host-url",
-    "triton_model": "mistral-nemo-instruct"
-  },
-}
-```
-
-Where:
-- `username`: ArangoDB database user with permissions to create and modify collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running.
-- `triton_model`: Name of the LLM model to use for text processing.
-
-### Using OpenAI (Public LLM)
+**Example using OpenAI:**
 
 ```json
 {
   "env": {
-    "openai_api_key": "your_openai_api_key",
-    "username": "your_username",
     "db_name": "your_database_name",
-    "api_provider": "openai"
+    "chat_api_provider": "openai",
+    "chat_api_url": "https://api.openai.com/v1",
+    "embedding_api_provider": "openai",
+    "embedding_api_url": "https://api.openai.com/v1",
+    "chat_model": "gpt-4o",
+    "embedding_model": "text-embedding-3-small",
+    "chat_api_key": "your_openai_api_key",
+    "embedding_api_key": "your_openai_api_key"
   },
 }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to create and modify collections
 - `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
-- `api_provider`: Specifies which LLM provider to use
-- `openai_api_key`: Your OpenAI API key
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
-By default, for OpenAI API, the service is using
-`gpt-4o-mini` and `text-embedding-3-small` models as LLM and
-embedding model respectively.
+When using the official OpenAI API, the service defaults to `gpt-4o` and 
+`text-embedding-3-small` models.
 {{< /info >}}
 
-### Using OpenRouter (Gemini, Anthropic, etc.)
+### Using different OpenAI-compatible services for chat and embedding
+
+You can use different OpenAI-compatible services for chat and embedding. For example, 
+you might use OpenRouter for chat and OpenAI for embeddings, depending 
+on your needs for performance, cost, or model availability.
 
-OpenRouter makes it possible to connect to a huge array of LLM API
-providers, including non-OpenAI LLMs like Gemini Flash, Anthropic Claude
-and publicly hosted open-source models.
+{{< info >}}
+Both `chat_api_provider` and `embedding_api_provider` must be set to the same value 
+(either both `"openai"` or both `"triton"`). You cannot mix Triton and OpenAI-compatible 
+APIs. However, you can use different OpenAI-compatible services (like OpenRouter, OpenAI, 
+Gemini, etc.) by setting both providers to `"openai"` and differentiating them with 
+different URLs in `chat_api_url` and `embedding_api_url`.
+{{< /info >}}
 
-When using the OpenRouter option, the LLM responses are served via OpenRouter
-while OpenAI is used for the embedding model.
+**Example using OpenRouter for chat and OpenAI for embedding:**
 
 ```json
     {
       "env": {
         "db_name": "your_database_name",
-        "username": "your_username",
-        "api_provider": "openrouter",
-        "openai_api_key": "your_openai_api_key",
-        "openrouter_api_key": "your_openrouter_api_key",
-        "openrouter_model": "mistralai/mistral-nemo"  // Specify a model here
+        "chat_api_provider": "openai",
+        "embedding_api_provider": "openai",
+        "chat_api_url": "https://openrouter.ai/api/v1",
+        "embedding_api_url": "https://api.openai.com/v1",
+        "chat_model": "mistral-nemo",
+        "embedding_model": "text-embedding-3-small",
+        "chat_api_key": "your_openrouter_api_key",
+        "embedding_api_key": "your_openai_api_key"
       },
     }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections  
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored  
-- `api_provider`: Specifies which LLM provider to use  
-- `openai_api_key`: Your OpenAI API key (for the embedding model)  
-- `openrouter_api_key`: Your OpenRouter API key (for the LLM)  
-- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`)
+- `db_name`: Name of the ArangoDB database where the knowledge graph is stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter)
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI)
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
+
+### Using Triton Inference Server for chat and embedding
 
-{{< info >}}
-When using OpenRouter, the service defaults to `mistral-nemo` for generation
-(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI).
-{{< /info >}}
+The first step is to install the LLM Host service with the LLM and
+embedding models of your choice. The setup will the use the 
+Triton Inference Server and MLflow at the backend. 
+For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
+and [Mlflow](mlflow.md) documentation.
+
+Once the `llmhost` service is up-and-running, then you can start the Importer
+service using the below configuration:
+
+```json
+{
+  "env": {
+    "db_name": "your_database_name",
+    "chat_api_provider": "triton",
+    "embedding_api_provider": "triton",
+    "chat_api_url": "your-arangodb-llm-host-url",
+    "embedding_api_url": "your-arangodb-llm-host-url",
+    "chat_model": "mistral-nemo-instruct",
+    "embedding_model": "nomic-embed-text-v1"
+  },
+}
+```
+
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Specifies which LLM provider to use for language model services
+- `embedding_api_provider`: API provider for embedding model services (e.g., "triton")
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
 
 ## Building Knowledge Graphs
 
diff --git a/site/content/ai-suite/reference/retriever.md b/site/content/ai-suite/reference/retriever.md
index 5949d8a369..47b4c0a92e 100644
--- a/site/content/ai-suite/reference/retriever.md
+++ b/site/content/ai-suite/reference/retriever.md
@@ -14,214 +14,347 @@ the Arango team.
 
 ## Overview
 
-The Retriever service offers two distinct search methods:
-- **Global search**: Analyzes entire document to identify themes and patterns,
-  perfect for high-level insights and comprehensive summaries.
-- **Local search**: Focuses on specific entities and their relationships, ideal
-  for detailed queries about particular concepts.
-
-The service supports both private (Triton Inference Server) and public (OpenAI)
-LLM deployments, making it flexible for various security and infrastructure
-requirements. With simple HTTP endpoints, you can easily query your knowledge
-graph and get contextually relevant responses.
+The Retriever service provides intelligent search and retrieval from knowledge graphs,
+with multiple search methods optimized for different query types. The service supports 
+LLMs through Triton Inference Server or any OpenAI-compatible API (including private 
+corporate LLMs), making it flexible for various deployment and infrastructure requirements.
 
 **Key features:**
-- Dual search methods for different query types
-- Support for both private and public LLM deployments
+- Multiple search methods optimized for different use cases
+- Streaming support for real-time responses for `UNIFIED` queries
+- Optional LLM orchestration for `LOCAL` queries
+- Configurable community hierarchy levels for `GLOBAL` queries
+- Support for Triton Inference Server and OpenAI-compatible APIs
 - Simple REST API interface
 - Integration with ArangoDB knowledge graphs
-- Configurable community hierarchy levels
 
 {{< tip >}}
-You can also use the GraphRAG Retriever service via the ArangoDB [web interface](../graphrag/web-interface.md).
+You can use the Retriever service via the [web interface](../graphrag/web-interface.md)
+for Instant and Deep Search, or through the API for full control over all query types.
 {{< /tip >}}
 
-## Search methods
+## Prerequisites
+
+Before using the Retriever service, you need to:
+
+1. **Create a GraphRAG project** - For detailed instructions on creating and 
+   managing projects, see the [Projects](gen-ai.md#projects) section in the 
+   GenAI Orchestration Service documentation.
+
+2. **Import data** - Use the [Importer](importer.md) service to transform your 
+   text documents into a knowledge graph stored in ArangoDB.
+
+## Search Methods
 
 The Retriever service enables intelligent search and retrieval of information
-from your knowledge graph. It provides two powerful search methods, global Search
-and local Search, that leverage the structured knowledge graph created by the Importer
-to deliver accurate and contextually relevant responses to your natural language queries.
+from your knowledge graph. It provides multiple search methods that leverage 
+the structured knowledge graph created by the Importer to deliver accurate and 
+contextually relevant responses to your natural language queries.
+
+### Instant Search
+
+Instant Search is designed for responses with very short latency. It triggers
+fast unified retrieval over relevant parts of the knowledge graph via hybrid
+(semantic and lexical) search and graph expansion algorithms, producing a fast,
+streamed natural-language response with clickable references to the relevant documents.
 
-### Global search
+{{< info >}}
+The Instant Search method is also available via the [Web interface](../graphrag/web-interface.md).
+{{< /info >}}
+
+```json
+{
+  "query_type": "UNIFIED"
+}
+```
 
-Global search is designed for queries that require understanding and aggregation
-of information across your entire document. It's particularly effective for questions
-about overall themes, patterns, or high-level insights in your data.
+### Deep Search
 
-- **Community-Based Analysis**: Uses pre-generated community reports from your
-  knowledge graph to understand the overall structure and themes of your data,
+Deep Search is designed for highly detailed, accurate responses that require understanding
+what kind of information is available in different parts of the knowledge graph and
+sequentially retrieving information in an LLM-guided research process. Use whenever
+detail and accuracy are required (e.g. aggregation of highly technical details) and
+very short latency is not (i.e. caching responses for frequently asked questions,
+or use case with agents or research use cases).
+
+{{< info >}}
+The Deep Search method is also available via the [Web interface](../graphrag/web-interface.md).
+{{< /info >}}
+
+```json
+{
+  "query_type": "LOCAL",
+  "use_llm_planner": true
+}
+```
+
+### Global Search
+
+Global search is designed for queries that require understanding and aggregation of information across your entire document. It’s particularly effective for questions about overall themes, patterns, or high-level insights in your data.
+
+- **Community-Based Analysis**: Uses pre-generated community reports from your knowledge graph to understand the overall structure and themes of your data.
 - **Map-Reduce Processing**:
-   - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points.
-   - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response.
+  - **Map Stage**: Processes community reports in parallel, generating intermediate responses with rated points.
+  - **Reduce Stage**: Aggregates the most important points to create a comprehensive final response.
 
-**Best use cases**:
-- "What are the main themes in the dataset?"
-- "Summarize the key findings across all documents"
-- "What are the most important concepts discussed?"
+```json
+{
+  "query_type": "GLOBAL"
+}
+```
 
-### Local search
+### Local Search
 
-Local search focuses on specific entities and their relationships within your
-knowledge graph. It is ideal for detailed queries about particular concepts,
-entities, or relationships.
+Local search focuses on specific entities and their relationships within your knowledge graph. It is ideal for detailed queries about particular concepts, entities, or relationships.
 
 - **Entity Identification**: Identifies relevant entities from the knowledge graph based on the query.
 - **Context Gathering**: Collects:
-   - Related text chunks from original documents.
-   - Connected entities and their strongest relationships.
-   - Entity descriptions and attributes.
-   - Context from the community each entity belongs to.
+  - Related text chunks from original documents.
+  - Connected entities and their strongest relationships.
+  - Entity descriptions and attributes.
+  - Context from the community each entity belongs to.
 - **Prioritized Response**: Generates a response using the most relevant gathered information.
 
-**Best use cases**:
-- "What are the properties of [specific entity]?"
-- "How is [entity A] related to [entity B]?"
-- "What are the key details about [specific concept]?"
+```json
+{
+  "query_type": "LOCAL",
+  "use_llm_planner": false
+}
+```
 
 ## Installation
 
-The Retriever service can be configured to use either the Triton Inference Server
-(for private LLM deployments) or OpenAI/OpenRouter (for public LLM deployments).
+The Retriever service can be configured to use either Triton Inference Server or any 
+OpenAI-compatible API. OpenAI-compatible APIs work with public providers (OpenAI, 
+OpenRouter, Gemini, Anthropic) as well as private corporate LLMs that expose an 
+OpenAI-compatible endpoint.
 
 To start the service, use the AI service endpoint `/v1/graphragretriever`. 
 Please refer to the documentation of [AI service](gen-ai.md) for more
 information on how to use it.
 
-### Using Triton Inference Server (Private LLM)
+### Using OpenAI-compatible APIs
 
-The first step is to install the LLM Host service with the LLM and
-embedding models of your choice. The setup will the use the 
-Triton Inference Server and MLflow at the backend. 
-For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
-and [Mlflow](mlflow.md) documentation.
+The `openai` provider works with any OpenAI-compatible API, including:
+- OpenAI (official API)
+- OpenRouter
+- Google Gemini
+- Anthropic Claude
+- Corporate or self-hosted LLMs with OpenAI-compatible endpoints
 
-Once the `llmhost` service is up-and-running, then you can start the Importer
-service using the below configuration:
+Set the `chat_api_url` and `embedding_api_url` to point to your provider's endpoint.
 
-```json
-{
-  "env": {
-    "username": "your_username",
-    "db_name": "your_database_name",
-    "api_provider": "triton",
-    "triton_url": "your-arangodb-llm-host-url",
-    "triton_model": "mistral-nemo-instruct"
-  },
-}
-```
-
-Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `triton_url`: URL of your Triton Inference Server instance. This should be the URL where your `llmhost` service is running.
-- `triton_model`: Name of the LLM model to use for text processing.
-
-### Using OpenAI (Public LLM)
+**Example using OpenAI:**
 
 ```json
 {
   "env": {
-    "openai_api_key": "your_openai_api_key",
-    "username": "your_username",
     "db_name": "your_database_name",
-    "api_provider": "openai"
+    "chat_api_provider": "openai",
+    "chat_api_url": "https://api.openai.com/v1",
+    "embedding_api_provider": "openai",
+    "embedding_api_url": "https://api.openai.com/v1",
+    "chat_model": "gpt-4o",
+    "embedding_model": "text-embedding-3-small",
+    "chat_api_key": "your_openai_api_key",
+    "embedding_api_key": "your_openai_api_key"
   },
 }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `openai_api_key`: Your OpenAI API key.
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
 
 {{< info >}}
-By default, for OpenAI API, the service is using
-`gpt-4o-mini` and `text-embedding-3-small` models as LLM and
-embedding model respectively.
+When using the official OpenAI API, the service defaults to `gpt-4o` and 
+`text-embedding-3-small` models.
 {{< /info >}}
 
-### Using OpenRouter (Gemini, Anthropic, etc.)
+### Using different OpenAI-compatible services for chat and embedding
 
-OpenRouter makes it possible to connect to a huge array of LLM API providers,
-including non-OpenAI LLMs like Gemini Flash, Anthropic Claude and publicly hosted
-open-source models.
+You can use different OpenAI-compatible services for chat and embedding. For example, 
+you might use OpenRouter for chat and OpenAI for embeddings, depending 
+on your needs for performance, cost, or model availability.
 
-When using the OpenRouter option, the LLM responses are served via OpenRouter while
-OpenAI is used for the embedding model.
+{{< info >}}
+Both `chat_api_provider` and `embedding_api_provider` must be set to the same value 
+(either both `"openai"` or both `"triton"`). You cannot mix Triton and OpenAI-compatible 
+APIs. However, you can use different OpenAI-compatible services (like OpenRouter, OpenAI, 
+Gemini, etc.) by setting both providers to `"openai"` and differentiating them with 
+different URLs in `chat_api_url` and `embedding_api_url`.
+{{< /info >}}
+
+**Example using OpenRouter for chat and OpenAI for embedding:**
 
 ```json
     {
       "env": {
         "db_name": "your_database_name",
-        "username": "your_username",
-        "api_provider": "openrouter",
-        "openai_api_key": "your_openai_api_key",
-        "openrouter_api_key": "your_openrouter_api_key",
-        "openrouter_model": "mistralai/mistral-nemo"  // Specify a model here
+        "chat_api_provider": "openai",
+        "embedding_api_provider": "openai",
+        "chat_api_url": "https://openrouter.ai/api/v1",
+        "embedding_api_url": "https://api.openai.com/v1",
+        "chat_model": "mistral-nemo",
+        "embedding_model": "text-embedding-3-small",
+        "chat_api_key": "your_openrouter_api_key",
+        "embedding_api_key": "your_openai_api_key"
       },
     }
 ```
 
 Where:
-- `username`: ArangoDB database user with permissions to access collections.
-- `db_name`: Name of the ArangoDB database where the knowledge graph is stored.
-- `api_provider`: Specifies which LLM provider to use.
-- `openai_api_key`: Your OpenAI API key (for the embedding model).
-- `openrouter_api_key`: Your OpenRouter API key (for the LLM).
-- `openrouter_model`: Desired LLM (optional; default is `mistral-nemo`).
+- `db_name`: Name of the ArangoDB database where the knowledge graph is stored
+- `chat_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `chat_api_url`: API endpoint URL for the chat/language model service (in this example, OpenRouter)
+- `embedding_api_provider`: Set to `"openai"` for any OpenAI-compatible API
+- `embedding_api_url`: API endpoint URL for the embedding model service (in this example, OpenAI)
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
+- `chat_api_key`: API key for authenticating with the chat/language model service
+- `embedding_api_key`: API key for authenticating with the embedding model service
+
+### Using Triton Inference Server for chat and embedding
 
-{{< info >}}
-When using OpenRouter, the service defaults to `mistral-nemo` for generation
-(via OpenRouter) and `text-embedding-3-small` for embeddings (via OpenAI).
-{{< /info >}}
+The first step is to install the LLM Host service with the LLM and
+embedding models of your choice. The setup will use the 
+Triton Inference Server and MLflow at the backend. 
+For more details, please refer to the [Triton Inference Server](triton-inference-server.md)
+and [Mlflow](mlflow.md) documentation.
+
+Once the `llmhost` service is up-and-running, then you can start the Retriever
+service using the below configuration:
+
+```json
+{
+  "env": {
+    "db_name": "your_database_name",
+    "chat_api_provider": "triton",
+    "embedding_api_provider": "triton",
+    "chat_api_url": "your-arangodb-llm-host-url",
+    "embedding_api_url": "your-arangodb-llm-host-url",
+    "chat_model": "mistral-nemo-instruct",
+    "embedding_model": "nomic-embed-text-v1"
+  },
+}
+```
+
+Where:
+- `db_name`: Name of the ArangoDB database where the knowledge graph will be stored
+- `chat_api_provider`: Specifies which LLM provider to use for language model services
+- `embedding_api_provider`: API provider for embedding model services (e.g., "triton")
+- `chat_api_url`: API endpoint URL for the chat/language model service
+- `embedding_api_url`: API endpoint URL for the embedding model service
+- `chat_model`: Specific language model to use for text generation and analysis
+- `embedding_model`: Specific model to use for generating text embeddings
 
 ## Executing queries
 
 After the Retriever service is installed successfully, you can interact with 
-it using the following HTTP endpoints, based on the selected search method.
+it using the following HTTP endpoints.
 
 {{< tabs "executing-queries" >}}
 
-{{< tab "Local search" >}}
+{{< tab "Instant Search" >}}
+
+```bash
+curl -X POST /v1/graphrag-query-stream \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "How are X and Y related?",
+    "query_type": "UNIFIED",
+    "provider": 0,
+    "include_metadata": true
+  }'
+```
+
+{{< /tab >}}
+
+{{< tab "Deep Search" >}}
+
 ```bash
 curl -X POST /v1/graphrag-query \
   -H "Content-Type: application/json" \
   -d '{
-    "query": "What is the AR3 Drone?",
-    "query_type": 2,
-    "provider": 0
+    "query": "What are the properties of a specific entity?",
+    "query_type": "LOCAL",
+    "use_llm_planner": true,
+    "provider": 0,
+    "include_metadata": true
   }'
 ```
+
 {{< /tab >}}
 
-{{< tab "Global search" >}}
+{{< tab "Global Search" >}}
 
 ```bash
 curl -X POST /v1/graphrag-query \
   -H "Content-Type: application/json" \
   -d '{
-    "query": "What is the AR3 Drone?",
+    "query": "What are the main themes discussed in the document?",
+    "query_type": "GLOBAL",
     "level": 1,
-    "query_type": 1,
-    "provider": 0
+    "provider": 0,
+    "include_metadata": true
   }'
 ```
+
+{{< /tab >}}
+
+{{< tab "Local Search" >}}
+
+```bash
+curl -X POST /v1/graphrag-query \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "What is the AR3 Drone?",
+    "query_type": "LOCAL",
+    "use_llm_planner": false,
+    "provider": 0,
+    "include_metadata": true
+  }'
+```
+
 {{< /tab >}}
 
 {{< /tabs >}}
 
-The request parameters are the following:
-- `query`: Your search query text.
-- `level`: The community hierarchy level to use for the search (`1` for top-level communities).
+### Request Parameters
+
+- `query`: Your search query text (required).
+
 - `query_type`: The type of search to perform.
-  - `1`: Global search.
-  - `2`: Local search.
+  - `GLOBAL` or `1`: Global Search (default if not specified).
+  - `LOCAL` or `2`: Deep Search when used with LLM planner (default), or standard Local Search when `llm_planner` is explicitly set to `false`.
+  - `UNIFIED` or `3`: Instant Search.
+
+- `use_llm_planner`: Whether to use LLM planner for intelligent query orchestration (optional)
+  - When enabled (default), orchestrates retrieval using both local and global strategies (powers Deep Search)
+  - Set to `false` for standard Local Search without orchestration
+
+- `level`: Community hierarchy level for analysis (only applicable for `GLOBAL` queries)
+  - `1` for top-level communities (broader themes)
+  - `2` for more granular communities (default)
+
 - `provider`: The LLM provider to use
-  - `0`: OpenAI (or OpenRouter)
-  - `1`: Triton
+  - `0`: Any OpenAI-compatible API (OpenAI, OpenRouter, Gemini, Anthropic, etc.)
+  - `1`: Triton Inference Server
+
+- `include_metadata`: Whether to include metadata in the response (optional, defaults to `false`)
+
+- `response_instruction`: Custom instructions for response generation style (optional)
+
+- `use_cache`: Whether to use caching for this query (optional, defaults to `false`)
+
+- `show_citations`: Whether to show inline citations in the response (optional, defaults to `false`)
 
 ## Health check
 
@@ -249,17 +382,6 @@ properties:
 }
 ```
 
-## Best Practices
-
-- **Choose the right search method**:
-   - Use global search for broad, thematic queries.
-   - Use local search for specific entity or relationship queries.
-
-
-- **Performance considerations**:
-   - Global search may take longer due to its map-reduce process.
-   - Local search is typically faster for concrete queries.
-
 ## API Reference
 
 For detailed API documentation, see the
diff --git a/site/content/ai-suite/reference/triton-inference-server.md b/site/content/ai-suite/reference/triton-inference-server.md
index 458226743e..1e1b982932 100644
--- a/site/content/ai-suite/reference/triton-inference-server.md
+++ b/site/content/ai-suite/reference/triton-inference-server.md
@@ -26,8 +26,8 @@ following steps:
 
 1. Install the Triton LLM Host service.
 2. Register your LLM model to MLflow by uploading the required files.
-3. Configure the [Importer](importer.md#using-triton-inference-server-private-llm) service to use your LLM model.
-4. Configure the [Retriever](retriever.md#using-triton-inference-server-private-llm) service to use your LLM model.
+3. Configure the [Importer](importer.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model.
+4. Configure the [Retriever](retriever.md#using-triton-inference-server-for-chat-and-embedding) service to use your LLM model.
 
 {{< tip >}}
 Check out the dedicated [ArangoDB MLflow](mlflow.md) documentation page to learn