refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

lalvarezt · 2025-11-05T17:43:25Z

This PR simplifies the template parser by removing expensive lookahead operations and consolidating multiple context-specific argument parsers into a single, unified approach.

The Problem

The previous grammar implementation attempted to be "smart" by using context-specific parsing rules:

simple_arg for basic operations (append, prepend, join)
regex_arg for regex patterns
split_arg for split separators
map_regex_arg for regex within map operations

Each parser had different lookahead logic to determine when special characters (|, :, {, }) should be treated as literals versus syntax. This required extensive operation_keyword lookahead and complex negative assertions like:

split_content = { !(":" ~ (number | range_part)) ~ !("|" ~ operation_keyword) ~ !("}" ~ EOI) ~ ANY }

The Solution

The refactor introduces a single unified argument parser:

argument     = { (escaped_char | normal_char)* }
normal_char  = { !("|" | "}" | "{" | ":" | "\\") ~ ANY }
escaped_char = { "\\" ~ ANY }

All operations now use the same escaping rules. Special characters require explicit escaping - no exceptions, no context-dependent behavior.

Why This is Better

Performance: Lookahead operations are expensive. Removing them provides immediate parsing performance improvements, especially for complex templates.

Maintainability: One set of rules to understand, test, and maintain instead of four context-specific parsers.

Predictability: Users now have clear, consistent rules. If a character is special (|, :, {, }, \), escape it. No need to understand parser internals or memorize context-specific exceptions.

Explicitness: Templates become more intentional. Compare:

{split:|:0}           # old: relies on smart parsing
{split:\|:0}          # new: explicit intent

The second version clearly shows what's a separator and what's syntax.

Robustness: Fewer edge cases means fewer bugs. Complex lookahead logic often fails in corner cases that are hard to predict.

The Trade-off

This is a breaking change. Templates that previously relied on smart escaping need updates:

Pipes in split separators: {split:|:0} → {split:\|:0}
Colons in regex patterns: {regex_extract:Version: (\d+):1} → {regex_extract:Version\: (\d+):1}

However, the migration path is straightforward, and the long-term benefits of a simpler, faster, more predictable parser far outweigh the one-time update cost.

lalvarezt · 2025-11-07T19:20:43Z

/bench d264124 a194856

github-actions · 2025-11-07T19:20:51Z

🚀 Benchmark comparison started

Comparing:

Baseline: d2641242c4628bc725ea5fa0602db1f5a51d8f9d
Current: a194856553402e5645376917e43726d816a3b581

Parameters:

Iterations: 100
Sizes: 1000,5000,10000

Results will be posted here when complete...

github-actions · 2025-11-07T19:24:58Z

🔬 Benchmark Comparison Report

Requested by: @lalvarezt

Comparison:

Baseline (older): d2641242c4628bc725ea5fa0602db1f5a51d8f9d (d264124)
Current (newer): a194856553402e5645376917e43726d816a3b581 (a194856)

Parameters:

Iterations: 100
Sizes: 1000,5000,10000

📊 Benchmark Comparison Report

Input Size: 10,000 paths

Baseline Timestamp: 1762543388
Current Timestamp: 1762543498

Performance Comparison

Template	Avg/Path	Change	p95	Change	Throughput	Change
Basename no ext	1.22μs	➖ -1.3%	1.26μs	➖ -0.4%	817.14K/s	➖ +1.3%
Breadcrumb last 3	1.57μs	➖ -0.3%	1.61μs	➖ +0.6%	635.35K/s	➖ +0.2%
Chain: map complex	3.22μs	🟢 -52.0%	3.29μs	🟢 -52.5%	310.78K/s	🟢 +108.2%
Chain: split+filter+sort+join	5.73μs	➖ -1.2%	5.77μs	➖ -1.4%	174.46K/s	➖ +1.2%
Chain: trim+upper+pad	1.39μs	➖ +0.8%	1.42μs	➖ +0.7%	718.51K/s	➖ -0.8%
Extract directory	1.35μs	➖ +0.7%	1.37μs	➖ +0.3%	739.97K/s	➖ -0.7%
Extract filename	375ns	🟡 +4.2%	396ns	⚠️ +5.3%	2.67M/s	🟡 -4.0%
File extension	1.23μs	➖ -1.7%	1.28μs	🟢 -6.8%	812.71K/s	➖ +1.7%
Filter	5.27μs	➖ +0.5%	5.32μs	➖ +0.2%	189.64K/s	➖ -0.5%
Join	1.34μs	➖ -0.8%	1.36μs	✅ -2.9%	748.21K/s	➖ +0.8%
Lower	940ns	🟡 +3.6%	1.00μs	⚠️ +6.8%	1.06M/s	🟡 -3.5%
Normalize filename	1.07μs	✅ -2.3%	1.10μs	➖ -1.3%	930.86K/s	✅ +2.3%
Pad	1.20μs	➖ +1.6%	1.23μs	🟡 +2.4%	830.72K/s	➖ -1.6%
Regex extract filename	4.86μs	➖ -0.9%	4.91μs	✅ -3.6%	205.86K/s	➖ +0.9%
Remove hidden dirs	5.03μs	➖ +1.4%	5.07μs	➖ +0.7%	198.98K/s	➖ -1.4%
Replace complex	3.45μs	➖ +0.6%	3.51μs	➖ +1.7%	289.94K/s	➖ -0.6%
Replace simple	3.59μs	➖ -0.9%	3.65μs	➖ -1.5%	278.30K/s	➖ +0.9%
Reverse	920ns	➖ +0.4%	949ns	➖ +1.0%	1.09M/s	➖ -0.4%
Slug generation	493ns	🟢 -45.9%	506ns	🟢 -45.7%	2.03M/s	🟢 +84.7%
Sort	1.75μs	➖ +1.3%	1.77μs	➖ +0.9%	571.43K/s	➖ -1.3%
Split all	540ns	✅ -4.9%	563ns	🟢 -5.5%	1.85M/s	🟢 +5.2%
Split last index	353ns	🟢 -8.5%	367ns	🟢 -10.9%	2.83M/s	🟢 +9.4%
Strip ANSI	276ns	✅ -2.8%	279ns	🟢 -7.3%	3.62M/s	✅ +3.1%
Substring	1.11μs	➖ -0.2%	1.16μs	➖ +0.8%	898.03K/s	➖ +0.2%
Trim	1.04μs	➖ -0.1%	1.09μs	🟡 +2.2%	964.47K/s	➖ +0.1%
Unique	2.25μs	➖ -0.7%	2.28μs	➖ -0.4%	443.45K/s	➖ +0.7%
Upper	892ns	➖ -0.7%	920ns	➖ -1.0%	1.12M/s	➖ +0.7%
Uppercase all components	2.42μs	🟡 +2.0%	2.81μs	🔴 +17.4%	413.49K/s	➖ -2.0%

Summary

Total templates compared: 28
Improvements: 3 🟢
Regressions: 0 🔴
Neutral: 25 ➖

✨ Performance Improvements

Chain: map complex: 52.0% faster
Slug generation: 45.9% faster
Split last index: 8.5% faster

Legend

🟢 Significant improvement (>5% faster)
✅ Improvement (2-5% faster)
➖ Neutral (<2% change)
🟡 Caution (2-5% slower)
⚠️ Warning (5-10% slower)
🔴 Regression (>10% slower)

_{Triggered by /bench command}

…head that offer no real benefit expect some change in templates that relied on smart escaping, now it's more intentional

Reorganized operation alternatives based on actual benchmark usage patterns to optimize PEG parser performance. Most frequently used operations (split, join, upper, lower, trim, substring, reverse) are now tested first.

Add fast-path optimizations for filter, filter_not, and replace operations when patterns contain no regex metacharacters. This avoids unnecessary regex compilation and matching for simple literal string operations.

lalvarezt self-assigned this Nov 5, 2025

lalvarezt added the enhancement New feature or request label Nov 5, 2025

This comment was marked as outdated.

Sign in to view

lalvarezt force-pushed the simplified-grammar branch 2 times, most recently from a936d14 to c1e6fa5 Compare November 6, 2025 09:32

This comment was marked as outdated.

Sign in to view

lalvarezt force-pushed the main branch from 21f1bc1 to b92e7e2 Compare November 6, 2025 16:07

lalvarezt force-pushed the simplified-grammar branch from 5ddd68b to a194856 Compare November 7, 2025 19:18

lalvarezt force-pushed the main branch from dc06069 to df93f9b Compare November 9, 2025 08:22

lalvarezt and others added 5 commits November 9, 2025 13:59

feat(bench): new bechmarking tool

5e02819

fix(bench): formalize initial commit

6033d04

refactor(pest)!: greatly simplified grammar, removed expensive look-a…

e1fca2f

…head that offer no real benefit expect some change in templates that relied on smart escaping, now it's more intentional

perf(grammar): reorder pest grammar operations by usage frequency

cd5a766

Reorganized operation alternatives based on actual benchmark usage patterns to optimize PEG parser performance. Most frequently used operations (split, join, upper, lower, trim, substring, reverse) are now tested first.

perf(operations): add literal string fast paths for filter and replace

92ab5d1

Add fast-path optimizations for filter, filter_not, and replace operations when patterns contain no regex metacharacters. This avoids unnecessary regex compilation and matching for simple literal string operations.

lalvarezt force-pushed the simplified-grammar branch from a194856 to 92ab5d1 Compare November 9, 2025 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

Uh oh!

lalvarezt commented Nov 5, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

lalvarezt commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

Are you sure you want to change the base?

refactor(pest)!: greatly simplified grammar, removed expensive look-ahead that offer no real benefit #11

Uh oh!

Conversation

lalvarezt commented Nov 5, 2025

The Problem

The Solution

Why This is Better

The Trade-off

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

lalvarezt commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

🔬 Benchmark Comparison Report

📊 Benchmark Comparison Report

Performance Comparison

Summary

✨ Performance Improvements

Legend

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants