Skip to content

Conversation

@ycedres
Copy link
Contributor

@ycedres ycedres commented Oct 14, 2025

The goal of this PR is improving the test runner and the test results. At this point some more automatic testing improvement is needed but this is a step forward.

  • Add placeholders to allow easier test configuration.

  • Improve judge model system prompt.

  • Improve confirmation mechanism for write-enabled tools (this allows confirmation mechanism of testing to also work).

  • Included patch produced in (so related tests pass)
    Fix nested tool calls #34

@ycedres ycedres force-pushed the test-placeholders branch 2 times, most recently from 37184fb to d5b9ea1 Compare November 11, 2025 10:57
@ycedres ycedres requested a review from cbbayburt November 11, 2025 11:06
@ycedres ycedres marked this pull request as ready for review November 11, 2025 11:06
Copy link
Contributor

@cbbayburt cbbayburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ycedres.

Just some minor comments below, but overall, looks good.

logger.info(log_string)
await ctx.info(log_string)

is_confirmed = str(confirm).lower() == 'true'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confirmed check looks different than the others. Can we extract the following parsing logic to a helper method to unify it and avoid repetition?

is_confirmed = str(confirm).lower() in ('true', 'yes', '1')

{
"id": "TC-OPS-001_confirm_request",
"prompt": "Add a new system at host 10.145.211.203 with activation key '1-DEBLIKE-KEY'",
"prompt": "Can you add a new system at host {new_system_host} with activation key '{key_deblike}'?",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the imperative "Do this .." instead of "Can you do this .." feels more natural IMO. What's the reason we prefer this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I was using Gemini to batch refactor and replace fixed values with placeholders and I did not realise it changed also this.

@ycedres ycedres requested a review from cbbayburt November 27, 2025 10:27
Copy link
Contributor

@cbbayburt cbbayburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @ycedres. LGTM 👍 feel free to merge

@ycedres ycedres merged commit c22a433 into uyuni-project:main Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants