Feature heauristic checks + cleanup #44

nitink23 · 2025-12-02T19:39:54Z

What's the issues or discussion related to this PR ?

Finding a better way to add heuristic checks. Cleaned up all the unnecessary files and unused functionality. Fixed the bug with versioning specialist definitions so everytime a specialist is enriched the version also increases. You can also manually bump up the version of your specialist.

What's added in this PR?

scenario.yamls now contain a section for heursitic checks which can be commands run for code or pattern matching looking for sections of a PRD.

heuristic_checks:

  enabled: true

  # Critical validation commands

  commands:

    - name: "install_succeeds"

      command: "pnpm install"

      weight: 2.0

      description: "Dependencies install successfully"

    - name: "build_succeeds"

      command: "pnpm build"

      weight: 3.0

      description: "Production build completes without errors"

    - name: "lint_passes"

      command: "pnpm lint"

      weight: 1.0

      description: "Code meets linting standards"

  # Check for Client Component patterns

  patterns:

    - name: "has_use_client_directive"

      file: "app/**/*.tsx"

      pattern: "^['\"]use client['\"]"

      weight: 3.0

      description: "Client component has 'use client' directive"

    - name: "uses_state_hook"

      file: "app/**/*.tsx"

      pattern: "useState|const \\[.*,.*set.*\\]\\s*=\\s*useState"

      weight: 2.0

      description: "Uses useState for state management"

    - name: "has_event_handlers"

      file: "app/**/*.tsx"

      pattern: "onClick|onChange|onSubmit|on[A-Z]\\w+"

      weight: 1.5

      description: "Event handlers implemented for interactivity"

    - name: "has_effect_hook"

      file: "app/**/*.tsx"

      pattern: "useEffect"

      weight: 1.0

      description: "Uses useEffect for side effects (optional)"

    - name: "imports_react_hooks"

      file: "app/**/*.tsx"

      pattern: "import.*\\{.*useState.*\\}.*from.*['\"]react['\"]"

      weight: 1.5

      description: "Imports React hooks properly"

    - name: "has_typescript_types"

      file: "app/**/*.tsx"

      pattern: "interface|type\\s+\\w+.*="

      weight: 0.5

      description: "TypeScript types defined"

Along with the new enhancements, the template version has increased.

What are the steps to test this PR?

pnpm bench

Run 002-client-component in the next.js suite to view heuristic changes.

to test the version bumping enable enrichment as well.

Documentation update for this PR (if applicable)?

Documentation already exists in docs/heuristic-checks-guide.md which covers the heuristic checks feature comprehensively. The guide includes:

Configuration examples
Check types (commands, files, patterns, structured, scripts)
Weighting system
Scoring integration
Practical examples

No additional documentation updates required as the existing guide already covers the new heuristic_checks section in scenario.yaml files.

(Optional) What's left to be done for this PR?

[] MCP + oauth should be created @Nsttt
[] make a easier way to create pattern matching heuristics
[] prompt and conversation should be sent R2 instead of D1 since the JSON's get really big for longer benchmarks

(Optional) What's the potential risk and how to mitigate it?

Who do you wish to review this PR other than required reviewers?

@Nsttt @zackarychapple

(Required) Pre-PR/Merge checklist

I have added/updated our documentation to cover this new behavior
I have added an explanation of my changes
I have written new tests (if applicable)
I have tested this locally (standing from a first time user point of view, never touch this app before)
I have mentioned the related person or team responsible for reviewing proposed changes
I have/will run tests, or ask for help to add test

nitink23 and others added 14 commits December 1, 2025 19:10

Fix: simplify

a19acd6

fixed the tool call

a92ba6a

refactored and added human scores

e4738c5

human scorer created

b3ff5ae

human scorer created

043c15d

cleanup scripts which was not being used

ecf846a

removed the scorer

771348f

cleanup human scorer logs

2a62b0d

versioning enrichment works and manual bump

6d5f6c9

remove enrichments for now

29878d9

feat heuristic checks

e399186

feat heuristic checks

643da6e

feat heuristic checks v2

7930430

remove duplicate code and cleanup

1be84cf

nitink23 changed the title ~~Feat/human in the loop~~ Feature heauristic checks + cleanup Dec 4, 2025

nitink23 marked this pull request as ready for review December 4, 2025 22:47

nitink23 marked this pull request as draft December 4, 2025 22:47

remove human scorer folder

6b3e711

nitink23 marked this pull request as ready for review December 5, 2025 22:34

swalker326 approved these changes Dec 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature heauristic checks + cleanup #44

Feature heauristic checks + cleanup #44

Uh oh!

nitink23 commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature heauristic checks + cleanup #44

Are you sure you want to change the base?

Feature heauristic checks + cleanup #44

Uh oh!

Conversation

nitink23 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's the issues or discussion related to this PR ?

What's added in this PR?

What are the steps to test this PR?

Documentation update for this PR (if applicable)?

(Optional) What's left to be done for this PR?

(Optional) What's the potential risk and how to mitigate it?

Who do you wish to review this PR other than required reviewers?

(Required) Pre-PR/Merge checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nitink23 commented Dec 2, 2025 •

edited

Loading