You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(shared): unify VQA and grounding models into insight model
Unified MIDSCENE_VQA_MODEL_* and MIDSCENE_GROUNDING_MODEL_* environment
variables into a single MIDSCENE_INSIGHT_MODEL_* configuration.
Changes:
- Updated type definitions to use 'insight' intent instead of 'VQA' and 'grounding'
- Unified 12 environment variables into 6 INSIGHT variables
- Updated all agent code to use 'insight' intent
- Fixed all test cases (140/140 passing)
- Added comprehensive documentation for intent-based model configuration
- Fixed duplicate case clause warnings in test files
Breaking changes:
- Replaced TIntent type: 'VQA' | 'grounding' -> 'insight'
- Environment variables MIDSCENE_VQA_MODEL_* and MIDSCENE_GROUNDING_MODEL_*
are no longer supported
Documentation updates:
- Added detailed intent-based configuration guide in model-provider.mdx (EN/ZH)
- Updated API documentation with modelConfig examples (EN/ZH)
- Updated choose-a-model.mdx with task type configuration section (EN/ZH)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: apps/site/docs/en/api.mdx
+41-2Lines changed: 41 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,9 +27,14 @@ In Playwright and Puppeteer, there are some common parameters:
27
27
28
28
These Agents also support the following advanced configuration parameters:
29
29
30
-
-`modelConfig: () => IModelConfig`: Optional. Custom model configuration function. Allows you to dynamically configure different models through code instead of environment variables. This is particularly useful when you need to use different models for different AI tasks (such as VQA, planning, grounding, etc.).
30
+
-`modelConfig: (params: { intent: TIntent }) => IModelConfig`: Optional. Custom model configuration function. Allows you to dynamically configure different models through code instead of environment variables. This is particularly useful when you need to use different models for different AI tasks (such as Insight, Planning, etc.).
31
31
32
-
**Example:**
32
+
The function receives a parameter object with an `intent` field indicating the current task type:
33
+
-`'insight'`: Visual understanding and element location tasks (such as `aiQuery`, `aiLocate`, `aiTap`, etc.)
34
+
-`'planning'`: Automatic planning tasks (such as `aiAct`)
35
+
-`'default'`: Other uncategorized tasks
36
+
37
+
**Basic Example:**
33
38
```typescript
34
39
const agent =newPuppeteerAgent(page, {
35
40
modelConfig: () => ({
@@ -41,6 +46,40 @@ These Agents also support the following advanced configuration parameters:
41
46
});
42
47
```
43
48
49
+
**Configure different models for different task types:**
50
+
```typescript
51
+
const agent =newPuppeteerAgent(page, {
52
+
modelConfig: ({ intent }) => {
53
+
// Use Qwen-VL model for Insight tasks (for visual understanding and location)
For more information about configuring models by task type, refer to the [Configure model and provider](./model-provider#configure-models-by-task-type-advanced) documentation.
82
+
44
83
-`createOpenAIClient: (openai, options) => Promise<OpenAI | undefined>`: Optional. Custom OpenAI client wrapper function. Allows you to wrap the OpenAI client instance for integrating observability tools (such as LangSmith, LangFuse) or applying custom middleware.
If a task type's configuration is not set, Midscene will automatically use the default `MIDSCENE_MODEL_*` configuration. In most cases, you only need to configure the default `MIDSCENE_MODEL_*` variables.
93
+
94
+
:::
95
+
42
96
### Advanced configs
43
97
44
98
Some advanced configs are also supported. Usually you don't need to use them.
0 commit comments