Skip to content

Conversation

@lunchpaillola
Copy link
Owner

Overview

Update the prompts used in the self-operating-computer framework so that the agent can better handle and reason about scrolling actions. This will improve the agent's ability to interact with interfaces that require scrolling to access content or controls.

Features

  • Revise and enhance prompt templates or instructions related to scrolling actions.
  • Ensure prompts clearly instruct the agent on when and how to scroll (e.g., scroll down to reveal hidden elements, scroll up to find navigation, etc.).
  • Add examples or clarifications for common scrolling scenarios (infinite scroll, paginated content, scrollbars, etc.).

Implementation

  • Identify where prompt templates or instructions for UI actions are defined in the codebase.
  • Update prompts to include more explicit guidance and context for scrolling.
  • Add or revise examples in prompts to cover a variety of scrolling use cases.
  • Test prompt changes to ensure the agent can reliably scroll to locate and interact with off-screen elements.

Testing

  • Simulate tasks that require scrolling (e.g., finding buttons or links that are not initially visible).
  • Verify that the agent can successfully scroll and complete the intended actions.
  • Confirm that updates do not negatively impact other UI interaction capabilities.

Initial commit for feature branch
@lunchpaillola
Copy link
Owner Author

@cursor

Please implement the feature described in this PR. The PRD contains all the requirements and implementation details.

Key points:

  • This is a new feature implementation
  • Follow the specifications in the PRD above
  • Create any necessary tests
  • Ensure code quality and documentation

Let me know if you need any clarification on the requirements.

@cursor
Copy link

cursor bot commented Aug 12, 2025

I implemented comprehensive scrolling support by enhancing all system prompts with detailed guidance and practical examples. I also significantly expanded test coverage by adding new scrolling-specific test cases and creating a dedicated test suite to validate the agent's ability to navigate scrollable interfaces.

  • I modified operate/models/prompts.py to include a new "SCROLLING GUIDANCE" section and practical examples in SYSTEM_PROMPT_STANDARD, SYSTEM_PROMPT_LABELED, and SYSTEM_PROMPT_OCR. This teaches the agent when and how to use pagedown, pageup, home, and end keys for navigation.
  • I added a new test file, test_scrolling.py, which includes unit tests to verify prompt content and key recognition, and integration test placeholders for real-world scrolling scenarios.
  • I extended evaluate.py by adding new scrolling-specific test cases, such as navigating Google.com and Wikipedia.org, to assess the agent's performance in scroll-heavy tasks.
  • I created SCROLLING_IMPLEMENTATION.md to document the overview, features, and implementation details of the new scrolling support.

6 tasks completed

  • Explore codebase structure and identify prompt template locations
  • Analyze current prompt templates for UI actions
  • Update prompts to include explicit scrolling guidance
  • Add scrolling examples and use cases to prompts
  • Create tests for scrolling functionality
  • Verify implementation doesn't break existing UI interactions
Open in Cursor Open in Web

Learn more about Cursor Agents

Co-authored-by: lola <lola@lunchpaillabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants