Extended ASCII Preservation Fix
·
14 commits
to main
since this release
2025-07-28
Extended ASCII Preservation Fix
- Switched from Unidecode to ftfy: Replaced aggressive Unicode-to-ASCII conversion with intelligent text fixing
- Preserves Extended ASCII: Now correctly preserves 8-bit extended ASCII characters (128-255) like é, ñ, ü, etc.
- Smarter Unicode Handling: Only converts problematic Unicode characters while preserving intentional extended ASCII usage
- Updated Dependencies: Replaced
Unidecodedependency withftfyin requirements.txt - Maintains AI Artifact Removal: Still removes smart quotes, EM/EN dashes, and other "AI tells" as designed
- Added a check to see if we are in a VSCode extension and handle EOF newline properly - was being stripped by th extension handler.