tests: mcp agent evals for new tool #134

Kylejeong2 · 2025-11-16T21:34:43Z

what

adding simple eval tests to make sure agent tool works/doesn't regress during changes

greptile-apps · 2025-11-16T21:37:03Z

Greptile Summary

Added three new evaluation test workflows across different config files to validate the new browserbase_stagehand_agent tool
Tests cover basic agent navigation, smoke testing, and complex multi-step tasks to ensure the agent functionality works correctly and doesn't regress

Confidence Score: 5/5

This PR is safe to merge with no risk
All changes are test configuration additions with no production code modifications, properly structured test workflows, and appropriate dependency updates
No files require special attention

Important Files Changed

Filename	Overview
evals/mcp-eval-basic.config.json	Added agent-basic-test workflow to test autonomous agent with simple navigation task
evals/mcp-eval-minimal.config.json	Added smoke-test-agent workflow to verify agent tool works with basic task
evals/mcp-eval.config.json	Added agent-complex-task-test workflow with multi-step Hacker News scraping task

Sequence Diagram

sequenceDiagram
    participant User
    participant EvalRunner
    participant MCPServer
    participant Agent
    participant Browser
    participant Website
    
    User->>EvalRunner: "Run agent eval test"
    EvalRunner->>MCPServer: "browserbase_session_create"
    MCPServer->>Browser: "Initialize browser session"
    Browser-->>MCPServer: "Session ID"
    EvalRunner->>MCPServer: "browserbase_stagehand_agent(prompt)"
    MCPServer->>Agent: "execute(instruction, maxSteps=20)"
    Agent->>Browser: "Navigate to URL"
    Browser->>Website: "HTTP request"
    Website-->>Browser: "Page content"
    Agent->>Browser: "Extract data"
    Browser-->>Agent: "Extracted result"
    Agent-->>MCPServer: "result.message"
    MCPServer-->>EvalRunner: "Agent result"
    EvalRunner->>MCPServer: "browserbase_session_close"
    MCPServer->>Browser: "Close session"
    EvalRunner-->>User: "Test result (pass/fail)"

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}
_{React with 👍 or 👎 to share your feedback on this new summary format}

Kylejeong2 added 2 commits November 16, 2025 13:33

tests: adding evals for agent tool

81f5080

lockfile

8307e5e

greptile-apps bot reviewed Nov 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: mcp agent evals for new tool #134

tests: mcp agent evals for new tool #134

Uh oh!

Kylejeong2 commented Nov 16, 2025

Uh oh!

greptile-apps bot commented Nov 16, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tests: mcp agent evals for new tool #134

Are you sure you want to change the base?

tests: mcp agent evals for new tool #134

Uh oh!

Conversation

Kylejeong2 commented Nov 16, 2025

what

Uh oh!

greptile-apps bot commented Nov 16, 2025

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants