|
1 | 1 | # UnicodeFix |
2 | 2 |
|
3 | | -UnicodeFix normalizes problematic Unicode artifacts into clean ASCII equivalents. |
4 | | - |
5 | | -This project was created to address the increasing frequency of invisible and typographic Unicode characters causing issues in code, configuration files, AI detection, and document processing. |
6 | | - |
7 | | -**This is an early release. Further polishing and enhancements will follow.** |
| 3 | + |
8 | 4 |
|
9 | 5 | - [UnicodeFix](#unicodefix) |
| 6 | + - [**Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code \& docs squeaky clean for real humans.**](#finally---a-tool-that-blasts-ai-fingerprints-torches-those-infuriating-smart-quotes-and-leaves-your-code--docs-squeaky-clean-for-real-humans) |
| 7 | + - [Why Is This Happening?](#why-is-this-happening) |
10 | 8 | - [Installation](#installation) |
11 | 9 | - [Usage](#usage) |
| 10 | + - [Brief Examples](#brief-examples) |
12 | 11 | - [Pipe / Filter (STDIN to STDOUT)](#pipe--filter-stdin-to-stdout) |
| 12 | + - [Batch Clean](#batch-clean) |
| 13 | + - [In-Place (Safe) Clean](#in-place-safe-clean) |
| 14 | + - [Preserve Temp File for Backup](#preserve-temp-file-for-backup) |
13 | 15 | - [Using in vi/vim/macvim](#using-in-vivimmacvim) |
| 16 | + - [What's New / What's Cool](#whats-new--whats-cool) |
14 | 17 | - [Shortcut for macOS](#shortcut-for-macos) |
15 | 18 | - [To add the Shortcut:](#to-add-the-shortcut) |
16 | 19 | - [What's in This Repository](#whats-in-this-repository) |
| 20 | + - [Testing and CI/CD](#testing-and-cicd) |
17 | 21 | - [Contributing](#contributing) |
18 | 22 | - [Support This and Other Projects](#support-this-and-other-projects) |
19 | 23 | - [Changelog](#changelog) |
20 | | - - [2025-04-27](#2025-04-27) |
21 | | - - [2025-04-26](#2025-04-26) |
22 | 24 | - [License](#license) |
23 | 25 |
|
| 26 | +--- |
| 27 | + |
| 28 | +### **Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code & docs squeaky clean for real humans.** |
| 29 | + |
| 30 | +Ever open up a file and instantly know it came from ChatGPT, Copilot, or one of their AI cousins? (Yeah, so can everyone else now.) |
| 31 | +UnicodeFix vaporizes all the weird dashes, curly quotes, invisible space ninjas, and digital "tells" that out you as an AI user - or just make your stuff fail linters and code reviews. |
| 32 | + |
| 33 | +**Whether you're a student, a dev, or an open-source rebel: this is your "eraser for AI breadcrumbs."** |
| 34 | + |
| 35 | +_Yes, it helps students cheat on their homework._ |
| 36 | +It also makes blog posts and AI-proofed emails look like you sweated over every character. |
| 37 | +Nearly a thousand people have grabbed it. Nobody's bought me a coffee yet, but hey… there's a first time for everything. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Why Is This Happening? |
| 42 | + |
| 43 | +Some folks think all this Unicode cruft is a side-effect of generative AI's training data. Others believe it's a deliberate move - baked-in "watermarks" to ID machine-generated text. Either way: these artifacts leave a trail. UnicodeFix wipes it. |
| 44 | + |
| 45 | +--- |
| 46 | + |
24 | 47 | ## Installation |
25 | 48 |
|
26 | 49 | Clone the repository and run the setup script: |
27 | 50 |
|
28 | | -```bash |
| 51 | +``` |
29 | 52 | git clone https://github.com/unixwzrd/UnicodeFix.git |
30 | 53 | cd UnicodeFix |
31 | 54 | bash setup.sh |
32 | 55 | ``` |
33 | 56 |
|
34 | | -The \`setup.sh\` script: |
| 57 | +The `setup.sh` script: |
| 58 | +- Creates a Python virtual environment just for UnicodeFix |
| 59 | +- Installs dependencies |
| 60 | +- Adds handy startup config to your `.bashrc` for one-command usage |
35 | 61 |
|
36 | | -- Creates a dedicated Python virtual environment |
37 | | -- Installs required dependencies |
38 | | -- Adds startup configuration to your \`.bashrc\` for easier usage |
| 62 | +See [setup.sh](setup.sh) for the nitty-gritty. |
39 | 63 |
|
40 | | -You can review [setup.sh](setup.sh) to see exactly what is modified. |
| 64 | +For serious environment nerds: [VenvUtil](https://github.com/unixwzrd/venvutil) is my full-featured Python env toolkit. |
41 | 65 |
|
42 | | -I also maintain a broader toolset for virtual environment management here: [VenvUtil](https://github.com/unixwzrd/venvutil), which may be of interest for more advanced users. |
| 66 | +--- |
43 | 67 |
|
44 | 68 | ## Usage |
45 | 69 |
|
46 | 70 | Once installed and activated: |
47 | 71 |
|
48 | | -```bash |
49 | | -(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ python bin/cleanup-text.py --help |
50 | | -usage: cleanup-text.py [-h] [infile ...] |
| 72 | +``` |
| 73 | +(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ cleanup-text --help |
| 74 | +
|
| 75 | +usage: cleanup-text [-h] [-i] [-o OUTPUT] [-t] [-p] [-n] [infile ...] |
51 | 76 |
|
52 | | -Clean Unicode quirks from text. |
| 77 | +Clean Unicode quirks from text. If no input files are given, reads from STDIN and writes to STDOUT (filter mode). If input files are given, creates cleaned files with .clean before the extension (e.g., foo.txt -> foo.clean.txt). Use -o - to force output to STDOUT for all input files, or -o <file> to specify a single output file |
| 78 | +(only with one input file). |
53 | 79 |
|
54 | 80 | positional arguments: |
55 | 81 | infile Input file(s) |
56 | 82 |
|
57 | 83 | options: |
58 | | - -h, --help Show this help message and exit |
| 84 | + -h, --help show this help message and exit |
| 85 | + -i, --invisible Preserve invisible Unicode characters (zero-width, non-breaking, etc.) |
| 86 | + -o OUTPUT, --output OUTPUT |
| 87 | + Output file name, or '-' for STDOUT. Only valid with one input file, or use '-' for STDOUT with multiple files. |
| 88 | + -t, --temp In-place cleaning: Move each input file to .tmp, clean it, write cleaned output to original name, and delete .tmp after success. |
| 89 | + -p, --preserve-tmp With -t, preserve the .tmp file after cleaning (do not delete it). Useful for backup or manual recovery. |
| 90 | + -n, --no-newline Do not add a newline at the end of the output file (suppress final newline). |
59 | 91 | ``` |
60 | 92 |
|
| 93 | +## Brief Examples |
| 94 | + |
61 | 95 | ### Pipe / Filter (STDIN to STDOUT) |
| 96 | +``` |
| 97 | +cat file.txt | cleanup-text > cleaned.txt |
| 98 | +``` |
62 | 99 |
|
63 | | -UnicodeFix can operate as a standard UNIX pipe: |
| 100 | +### Batch Clean |
| 101 | +``` |
| 102 | +cleanup-text *.txt |
| 103 | +``` |
64 | 104 |
|
65 | | -```bash |
66 | | -cat file.txt | cleanup-text > cleaned.txt |
| 105 | +### In-Place (Safe) Clean |
| 106 | +``` |
| 107 | +cleanup-text -t myfile.txt |
67 | 108 | ``` |
68 | 109 |
|
69 | | -If no input file arguments are given, it automatically reads from standard input and writes to standard output. |
| 110 | +### Preserve Temp File for Backup |
| 111 | +``` |
| 112 | +cleanup-text -t -p myfile.txt |
| 113 | +``` |
70 | 114 |
|
71 | 115 | ### Using in vi/vim/macvim |
72 | 116 |
|
73 | | -You can run UnicodeFix as a filter within vi/vim/macvim: |
74 | | - |
75 | | -```vim |
| 117 | +``` |
76 | 118 | :%!cleanup-text |
77 | 119 | ``` |
78 | 120 |
|
79 | | -This command rewrites the entire buffer with cleaned text. |
| 121 | +You can run it from Vim, VS Code in Vim mode, or as a pre-commit. Use it for email, blog posts, whatever. Ignore the naysayers - this is *real-world convenience.* |
| 122 | + |
| 123 | +See [cleanup-text.md](docs/cleanup-text.md) for deeper dives and arcane options. |
| 124 | + |
| 125 | +- **Make sure your Python environment is activated** before launching your editor, or wrap it in a shell script that does it for you. |
| 126 | +- Adjust your editor's shell settings as needed for best results. |
80 | 127 |
|
81 | | -**Note**: |
82 | | -- Ensure your virtual environment is activated before launching your editor, or |
83 | | -- Use a shell wrapper that sources your \`.bashrc\` and activates the environment automatically. |
| 128 | +--- |
84 | 129 |
|
85 | | -Depending on how you manage virtual environments, you may need to adjust your editor’s shell invocation settings. |
| 130 | +## What's New / What's Cool |
| 131 | + |
| 132 | +- **Vaporizes invisible Unicode (unless you tell it not to)** |
| 133 | +- **Normalizes EM/EN dashes to true ASCII - no more AI " - " nonsense** |
| 134 | +- **Wipes AI "tells," watermarks, and digital fingerprints** |
| 135 | +- **Fixes trailing whitespace, normalizes newlines, burns the digital junk** |
| 136 | +- **Portable (Python 3.7+), cross-platform** |
| 137 | +- **Integrated macOS Shortcut for right-click cleaning in Finder** |
| 138 | +- **Can be used in CI/CD - but also by normal humans, not just pipeline freaks** |
| 139 | + |
| 140 | +> *Fun fact*: Even Python will execute code with "curly quotes." Your IDE, email client, and browser all sneak these in. UnicodeFix hunts them down and torches them. |
| 141 | +
|
| 142 | +--- |
86 | 143 |
|
87 | 144 | ## Shortcut for macOS |
88 | 145 |
|
89 | | -UnicodeFix includes a macOS Shortcut for direct Finder integration. |
| 146 | +UnicodeFix ships with a macOS Shortcut for direct Finder integration. |
90 | 147 |
|
91 | | -You can right-click one or more files and select a Quick Action to clean Unicode quirks without opening a terminal. |
| 148 | +Right-click files, pick a Quick Action, and - bam - no terminal required. |
92 | 149 |
|
93 | 150 | ### To add the Shortcut: |
94 | 151 |
|
95 | 152 | 1. Open the **Shortcuts** app. |
96 | | -2. Navigate to \`File -> Import\`. |
97 | | - |
| 153 | +2. Choose `File -> Import`. |
98 | 154 |  |
| 155 | +3. Select the Shortcut in `macOS/Strip Unicode.shortcut`. |
| 156 | +  |
| 157 | +4. Edit it to point to your local `cleanup-text.py`. |
| 158 | +  |
| 159 | +5. Relaunch Finder (`Cmd+Opt+Esc` → select Finder → Relaunch) if needed. |
| 160 | +6. After setup, right-click files, choose `Quick Actions`, select `Strip Unicode`. |
| 161 | +  |
99 | 162 |
|
100 | | -3. Select the Shortcut file located in \`macOS/Strip Unicode.shortcut\`. |
| 163 | +--- |
101 | 164 |
|
102 | | -  |
| 165 | +## What's in This Repository |
103 | 166 |
|
104 | | -4. Edit the Shortcut to point to your local installation of \`cleanup-text.py\`. |
| 167 | +- [bin/cleanup-text.py](bin/cleanup-text.py) - Main cleaning script |
| 168 | +- [bin/cleanup-text](bin/cleanup-text) - Symlink for CLI usage |
| 169 | +- [setup.sh](setup.sh) - Easy setup and env configuration |
| 170 | +- [requirements.txt](requirements.txt) - Python dependencies |
| 171 | +- [macOS/](macOS/) - Shortcuts, scripts for Finder |
| 172 | +- [data/](data/) - Example test files |
| 173 | +- [test/](test/) - Automated test suite for all features/edge cases |
| 174 | +- [docs/](docs/) - Documentation and screenshots |
| 175 | +- [LICENSE](LICENSE) |
| 176 | +- [README.md](README.md) - This file |
105 | 177 |
|
106 | | -  |
| 178 | +--- |
107 | 179 |
|
108 | | -5. You may need to relaunch Finder (\`Command+Option+Esc\` → Select Finder → Relaunch). |
| 180 | +## Testing and CI/CD |
109 | 181 |
|
110 | | -6. After setup, right-click selected files, choose \`Quick Actions\`, and select \`Strip Unicode\`. |
| 182 | +UnicodeFix comes with a full, automated test suite: |
111 | 183 |
|
112 | | -  |
| 184 | +- Runs every feature & scenario on files in `data/` |
| 185 | +- Outputs to `test_output/` (by scenario, with diffs and word counts) |
| 186 | +- Clean up with: `./test/test_all.sh clean` |
| 187 | +- Plug into your CI/CD pipeline or just use as a "paranoia check" before shipping anything |
113 | 188 |
|
114 | | -## What's in This Repository |
| 189 | +**Pro tip:** Run the tests before you merge, publish, or email a "final" version. |
| 190 | + |
| 191 | +See [docs/test-suite.md](docs/test-suite.md) for the deep dive. |
115 | 192 |
|
116 | | -- [bin/cleanup-text.py](bin/cleanup-text.py) — Main cleaning script |
117 | | -- [bin/cleanup-text](bin/cleanup-text) — Symlink for command-line usage |
118 | | -- [setup.sh](setup.sh) — Virtual environment setup script |
119 | | -- [requirements.txt](requirements.txt) — Python dependencies |
120 | | -- [macOS/](macOS/) — macOS Shortcut for Finder integration |
121 | | -- [data/](data/) — Example test files with Unicode artifacts |
122 | | -- [docs/](docs/) — Documentation and screenshots |
123 | | -- [LICENSE](LICENSE) — License information |
124 | | -- [README.md](README.md) — This file |
| 193 | +--- |
125 | 194 |
|
126 | 195 | ## Contributing |
127 | 196 |
|
128 | | -Feedback, testing, bug reports, and pull requests are welcome. |
| 197 | +Feedback, bug reports, and patches welcome. |
| 198 | + |
| 199 | +If you've got a better integration path for your favorite platform, let's make it happen. |
| 200 | +Pull requests with attitude, creativity, and clean diffs appreciated. |
129 | 201 |
|
130 | | -If you find a better integration path for Linux or Windows platforms, feel free to open an issue or contribute a patch. |
| 202 | +--- |
131 | 203 |
|
132 | 204 | ## Support This and Other Projects |
133 | 205 |
|
134 | | -If you find UnicodeFix or my other projects valuable, please consider supporting continued development: |
| 206 | +If UnicodeFix (or my other projects) saved your bacon or made you smile, |
| 207 | +please consider fueling my caffeine habit and indie dev obsession: |
135 | 208 |
|
136 | 209 | - [Patreon](https://patreon.com/unixwzrd) |
137 | 210 | - [Ko-Fi](https://ko-fi.com/unixwzrd) |
138 | 211 | - [Buy Me a Coffee](https://buymeacoffee.com/unixwzrd) |
139 | 212 |
|
140 | | -Thank you for your support. |
| 213 | +One coffee = one more tool released to the wild. |
| 214 | + |
| 215 | +Thank you for keeping solo development alive! |
| 216 | + |
| 217 | +--- |
141 | 218 |
|
142 | 219 | ## Changelog |
143 | 220 |
|
144 | | -### 2025-04-27 |
145 | | -- Fixed behavior when processing STDIN pipes |
146 | | -- Added trailing whitespace and blank line normalization |
147 | | -- Added shell script wrapper for easier activation from editors |
| 221 | +**See [CHANGELOG.md](CHANGELOG.md) for the latest drop.** |
148 | 222 |
|
149 | | -### 2025-04-26 |
150 | | -- Initial release |
| 223 | +--- |
151 | 224 |
|
152 | 225 | ## License |
153 | 226 |
|
154 | | -Copyright 2025 |
| 227 | +Copyright 2025 |
155 | 228 | [unixwzrd@unixwzrd.ai](mailto:unixwzrd@unixwzrd.ai) |
156 | 229 |
|
157 | 230 | [MIT License](LICENSE) |
|
0 commit comments