Skip to content

Conversation

@Kylejeong2
Copy link
Member

what

adding simple eval tests to make sure agent tool works/doesn't regress during changes

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 16, 2025

Greptile Summary

  • Added three new evaluation test workflows across different config files to validate the new browserbase_stagehand_agent tool
  • Tests cover basic agent navigation, smoke testing, and complex multi-step tasks to ensure the agent functionality works correctly and doesn't regress

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • All changes are test configuration additions with no production code modifications, properly structured test workflows, and appropriate dependency updates
  • No files require special attention

Important Files Changed

Filename Overview
evals/mcp-eval-basic.config.json Added agent-basic-test workflow to test autonomous agent with simple navigation task
evals/mcp-eval-minimal.config.json Added smoke-test-agent workflow to verify agent tool works with basic task
evals/mcp-eval.config.json Added agent-complex-task-test workflow with multi-step Hacker News scraping task

Sequence Diagram

sequenceDiagram
    participant User
    participant EvalRunner
    participant MCPServer
    participant Agent
    participant Browser
    participant Website
    
    User->>EvalRunner: "Run agent eval test"
    EvalRunner->>MCPServer: "browserbase_session_create"
    MCPServer->>Browser: "Initialize browser session"
    Browser-->>MCPServer: "Session ID"
    EvalRunner->>MCPServer: "browserbase_stagehand_agent(prompt)"
    MCPServer->>Agent: "execute(instruction, maxSteps=20)"
    Agent->>Browser: "Navigate to URL"
    Browser->>Website: "HTTP request"
    Website-->>Browser: "Page content"
    Agent->>Browser: "Extract data"
    Browser-->>Agent: "Extracted result"
    Agent-->>MCPServer: "result.message"
    MCPServer-->>EvalRunner: "Agent result"
    EvalRunner->>MCPServer: "browserbase_session_close"
    MCPServer->>Browser: "Close session"
    EvalRunner-->>User: "Test result (pass/fail)"
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants