Skip to content

Conversation

@lalvarezt
Copy link
Owner

This PR simplifies the template parser by removing expensive lookahead operations and consolidating multiple context-specific argument parsers into a single, unified approach.

The Problem

The previous grammar implementation attempted to be "smart" by using context-specific parsing rules:

  • simple_arg for basic operations (append, prepend, join)
  • regex_arg for regex patterns
  • split_arg for split separators
  • map_regex_arg for regex within map operations

Each parser had different lookahead logic to determine when special characters (|, :, {, }) should be treated as literals versus syntax. This required extensive operation_keyword lookahead and complex negative assertions like:

split_content = { !(":" ~ (number | range_part)) ~ !("|" ~ operation_keyword) ~ !("}" ~ EOI) ~ ANY }

The Solution

The refactor introduces a single unified argument parser:

argument     = { (escaped_char | normal_char)* }
normal_char  = { !("|" | "}" | "{" | ":" | "\\") ~ ANY }
escaped_char = { "\\" ~ ANY }

All operations now use the same escaping rules. Special characters require explicit escaping - no exceptions, no context-dependent behavior.

Why This is Better

Performance: Lookahead operations are expensive. Removing them provides immediate parsing performance improvements, especially for complex templates.

Maintainability: One set of rules to understand, test, and maintain instead of four context-specific parsers.

Predictability: Users now have clear, consistent rules. If a character is special (|, :, {, }, \), escape it. No need to understand parser internals or memorize context-specific exceptions.

Explicitness: Templates become more intentional. Compare:

{split:|:0}           # old: relies on smart parsing
{split:\|:0}          # new: explicit intent

The second version clearly shows what's a separator and what's syntax.

Robustness: Fewer edge cases means fewer bugs. Complex lookahead logic often fails in corner cases that are hard to predict.

The Trade-off

This is a breaking change. Templates that previously relied on smart escaping need updates:

  • Pipes in split separators: {split:|:0}{split:\|:0}
  • Colons in regex patterns: {regex_extract:Version: (\d+):1}{regex_extract:Version\: (\d+):1}

However, the migration path is straightforward, and the long-term benefits of a simpler, faster, more predictable parser far outweigh the one-time update cost.

@lalvarezt lalvarezt self-assigned this Nov 5, 2025
@lalvarezt lalvarezt added the enhancement New feature or request label Nov 5, 2025
@github-actions

This comment was marked as outdated.

@lalvarezt lalvarezt force-pushed the simplified-grammar branch 2 times, most recently from a936d14 to c1e6fa5 Compare November 6, 2025 09:32
@lalvarezt

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@lalvarezt
Copy link
Owner Author

/bench d264124 a194856

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

🚀 Benchmark comparison started

Comparing:

  • Baseline: d2641242c4628bc725ea5fa0602db1f5a51d8f9d
  • Current: a194856553402e5645376917e43726d816a3b581

Parameters:

  • Iterations: 100
  • Sizes: 1000,5000,10000

Results will be posted here when complete...

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

🔬 Benchmark Comparison Report

Requested by: @lalvarezt

Comparison:

  • Baseline (older): d2641242c4628bc725ea5fa0602db1f5a51d8f9d (d264124)
  • Current (newer): a194856553402e5645376917e43726d816a3b581 (a194856)

Parameters:

  • Iterations: 100
  • Sizes: 1000,5000,10000

📊 Benchmark Comparison Report

Input Size: 10,000 paths

Baseline Timestamp: 1762543388
Current Timestamp: 1762543498

Performance Comparison

Template Avg/Path Change p95 Change Throughput Change
Basename no ext 1.22μs ➖ -1.3% 1.26μs ➖ -0.4% 817.14K/s ➖ +1.3%
Breadcrumb last 3 1.57μs ➖ -0.3% 1.61μs ➖ +0.6% 635.35K/s ➖ +0.2%
Chain: map complex 3.22μs 🟢 -52.0% 3.29μs 🟢 -52.5% 310.78K/s 🟢 +108.2%
Chain: split+filter+sort+join 5.73μs ➖ -1.2% 5.77μs ➖ -1.4% 174.46K/s ➖ +1.2%
Chain: trim+upper+pad 1.39μs ➖ +0.8% 1.42μs ➖ +0.7% 718.51K/s ➖ -0.8%
Extract directory 1.35μs ➖ +0.7% 1.37μs ➖ +0.3% 739.97K/s ➖ -0.7%
Extract filename 375ns 🟡 +4.2% 396ns ⚠️ +5.3% 2.67M/s 🟡 -4.0%
File extension 1.23μs ➖ -1.7% 1.28μs 🟢 -6.8% 812.71K/s ➖ +1.7%
Filter 5.27μs ➖ +0.5% 5.32μs ➖ +0.2% 189.64K/s ➖ -0.5%
Join 1.34μs ➖ -0.8% 1.36μs ✅ -2.9% 748.21K/s ➖ +0.8%
Lower 940ns 🟡 +3.6% 1.00μs ⚠️ +6.8% 1.06M/s 🟡 -3.5%
Normalize filename 1.07μs ✅ -2.3% 1.10μs ➖ -1.3% 930.86K/s ✅ +2.3%
Pad 1.20μs ➖ +1.6% 1.23μs 🟡 +2.4% 830.72K/s ➖ -1.6%
Regex extract filename 4.86μs ➖ -0.9% 4.91μs ✅ -3.6% 205.86K/s ➖ +0.9%
Remove hidden dirs 5.03μs ➖ +1.4% 5.07μs ➖ +0.7% 198.98K/s ➖ -1.4%
Replace complex 3.45μs ➖ +0.6% 3.51μs ➖ +1.7% 289.94K/s ➖ -0.6%
Replace simple 3.59μs ➖ -0.9% 3.65μs ➖ -1.5% 278.30K/s ➖ +0.9%
Reverse 920ns ➖ +0.4% 949ns ➖ +1.0% 1.09M/s ➖ -0.4%
Slug generation 493ns 🟢 -45.9% 506ns 🟢 -45.7% 2.03M/s 🟢 +84.7%
Sort 1.75μs ➖ +1.3% 1.77μs ➖ +0.9% 571.43K/s ➖ -1.3%
Split all 540ns ✅ -4.9% 563ns 🟢 -5.5% 1.85M/s 🟢 +5.2%
Split last index 353ns 🟢 -8.5% 367ns 🟢 -10.9% 2.83M/s 🟢 +9.4%
Strip ANSI 276ns ✅ -2.8% 279ns 🟢 -7.3% 3.62M/s ✅ +3.1%
Substring 1.11μs ➖ -0.2% 1.16μs ➖ +0.8% 898.03K/s ➖ +0.2%
Trim 1.04μs ➖ -0.1% 1.09μs 🟡 +2.2% 964.47K/s ➖ +0.1%
Unique 2.25μs ➖ -0.7% 2.28μs ➖ -0.4% 443.45K/s ➖ +0.7%
Upper 892ns ➖ -0.7% 920ns ➖ -1.0% 1.12M/s ➖ +0.7%
Uppercase all components 2.42μs 🟡 +2.0% 2.81μs 🔴 +17.4% 413.49K/s ➖ -2.0%

Summary

  • Total templates compared: 28
  • Improvements: 3 🟢
  • Regressions: 0 🔴
  • Neutral: 25 ➖

✨ Performance Improvements

  • Chain: map complex: 52.0% faster
  • Slug generation: 45.9% faster
  • Split last index: 8.5% faster

Legend

  • 🟢 Significant improvement (>5% faster)
  • ✅ Improvement (2-5% faster)
  • ➖ Neutral (<2% change)
  • 🟡 Caution (2-5% slower)
  • ⚠️ Warning (5-10% slower)
  • 🔴 Regression (>10% slower)

Triggered by /bench command

lalvarezt and others added 5 commits November 9, 2025 13:59
…head that offer no real benefit

expect some change in templates that relied on smart escaping, now it's
more intentional
Reorganized operation alternatives based on actual benchmark usage
patterns to optimize PEG parser performance. Most frequently used
operations (split, join, upper, lower, trim, substring, reverse) are now
tested first.
Add fast-path optimizations for filter, filter_not, and replace
operations when patterns contain no regex metacharacters. This avoids
unnecessary regex compilation and matching for simple literal string
operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants