You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
20250907_00-Release - CodExorcism Release - Not just for Codex
- Expanded quote normalization: map additional Unicode quote/prime/angle/fullwidth marks to ASCII ' and " for shell-safe output
- Refined VS Code filter handling: only apply newline compensation in filter mode; never in file-write modes; respect CI/CD env
- Normalize Unicode spaces: replace NBSP (U+00A0), NARROW NBSP (U+202F), EN/EM/THIN spaces (U+2000–U+200A), IDEOGRAPHIC SPACE (U+3000), etc., with ASCII space
- Remove bidi/zero-width controls: strip LRM/RLM, embeddings/overrides/isolates, ZWSP/ZWNJ/ZWJ, BOM
- Note: These artifacts were observed in content produced by Codex/VS Code extensions
- No breaking changes; behavior unchanged for already-clean inputs
- Ellipsis handling and normalization
-[**Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code \& docs squeaky clean for real humans.**](#finally---a-tool-that-blasts-ai-fingerprints-torches-those-infuriating-smart-quotes-and-leaves-your-code--docs-squeaky-clean-for-real-humans)
7
9
-[Why Is This Happening?](#why-is-this-happening)
8
10
-[Installation](#installation)
9
11
-[Usage](#usage)
12
+
-[New options](#new-options)
13
+
-[When to preserve invisible characters (`-i`)](#when-to-preserve-invisible-characters--i)
10
14
-[Brief Examples](#brief-examples)
11
15
-[Pipe / Filter (STDIN to STDOUT)](#pipe--filter-stdin-to-stdout)
12
16
-[Batch Clean](#batch-clean)
13
17
-[In-Place (Safe) Clean](#in-place-safe-clean)
14
18
-[Preserve Temp File for Backup](#preserve-temp-file-for-backup)
15
19
-[Using in vi/vim/macvim](#using-in-vivimmacvim)
16
20
-[What's New / What's Cool](#whats-new--whats-cool)
-[What's in This Repository](#whats-in-this-repository)
20
27
-[Testing and CI/CD](#testing-and-cicd)
21
28
-[Contributing](#contributing)
@@ -42,6 +49,8 @@ Nearly a thousand people have grabbed it. Nobody's bought me a coffee yet, but h
42
49
43
50
Some folks think all this Unicode cruft is a side-effect of generative AI's training data. Others believe it's a deliberate move - baked-in "watermarks" to ID machine-generated text. Either way: these artifacts leave a trail. UnicodeFix wipes it.
44
51
52
+
Be careful, professors and reviewers may even start planting Unicode honeypots in starter code or essays - UnicodeFix torches those too. In this "AI Arms Race," `diff` and `vimdiff` are your night-vision goggles.
53
+
45
54
---
46
55
47
56
## Installation
@@ -55,6 +64,7 @@ bash setup.sh
55
64
```
56
65
57
66
The `setup.sh` script:
67
+
58
68
- Creates a Python virtual environment just for UnicodeFix
59
69
- Installs dependencies
60
70
- Adds handy startup config to your `.bashrc` for one-command usage
@@ -69,56 +79,79 @@ For serious environment nerds: [VenvUtil](https://github.com/unixwzrd/venvutil)
Clean Unicode quirks from text. If no input files are given, reads from STDIN and writes to STDOUT (filter mode). If input files are given, creates cleaned files with .clean before the extension (e.g., foo.txt -> foo.clean.txt). Use -o - to force output to STDOUT for all input files, or -o <file> to specify a single output file
78
-
(only with one input file).
87
+
Clean Unicode quirks from text. If no input files are given, reads from STDIN and writes to STDOUT (filter mode). If input files are given, creates cleaned files with .clean before the extension (e.g., foo.txt -> foo.clean.txt). Use -o - to force output to STDOUT for all input files, or -o <file> to specify a single output file (only with one
Preserve Unicode smart quotes (do not convert to ASCII)
98
+
-D, --keep-dashes Preserve Unicode EN/EM dashes (do not convert to ASCII)
99
+
-n, --no-newline Do not add a newline at the end of the output file (suppress final newline).
86
100
-o OUTPUT, --output OUTPUT
87
101
Output file name, or '-'for STDOUT. Only valid with one input file, or use '-'for STDOUT with multiple files.
88
102
-t, --temp In-place cleaning: Move each input file to .tmp, clean it, write cleaned output to original name, and delete .tmp after success.
89
103
-p, --preserve-tmp With -t, preserve the .tmp file after cleaning (do not delete it). Useful for backup or manual recovery.
90
-
-n, --no-newline Do not add a newline at the end of the output file (suppress final newline).
91
104
```
92
105
106
+
### New options
107
+
108
+
-`-Q`, `--keep-smart-quotes`: Preserve Unicode smart quotes (curly single/double quotes). Useful when preparing prose/blog posts where typographic quotes are intentional. Default behavior converts them to ASCII for shell/CI safety.
109
+
-`-D`, `--keep-dashes`: Preserve EN/EM dashes. Useful when stylistic punctuation is desired in prose. Default behavior converts EM dash to ` - ` and EN dash to `-`.
110
+
111
+
#### When to preserve invisible characters (`-i`)
112
+
113
+
In most code/CI workflows, invisible/bidi controls are accidental and should be removed (default). Rare cases to preserve (`-i`):
114
+
115
+
- Linguistic text where ZWJ/ZWNJ influence shaping
116
+
- Intentional watermarks/markers in text
117
+
- Forensic/debug inspections before deciding what to strip
118
+
93
119
## Brief Examples
94
120
95
121
### Pipe / Filter (STDIN to STDOUT)
96
-
```
122
+
123
+
```bash
97
124
cat file.txt | cleanup-text > cleaned.txt
98
125
```
99
126
100
127
### Batch Clean
101
-
```
128
+
129
+
```bash
102
130
cleanup-text *.txt
103
131
```
104
132
105
133
### In-Place (Safe) Clean
106
-
```
134
+
135
+
```bash
107
136
cleanup-text -t myfile.txt
108
137
```
109
138
110
139
### Preserve Temp File for Backup
111
-
```
140
+
141
+
```bash
112
142
cleanup-text -t -p myfile.txt
113
143
```
114
144
115
145
### Using in vi/vim/macvim
116
146
117
-
```
147
+
```vim
118
148
:%!cleanup-text
119
149
```
120
150
121
-
You can run it from Vim, VS Code in Vim mode, or as a pre-commit. Use it for email, blog posts, whatever. Ignore the naysayers - this is *real-world convenience.*
151
+
Works great for vi/Vim purists, VS Code hipsters, or anyone who just wants their text to behave like text.
152
+
Also handy if you’re trying to slip your AI-generated code past your CS prof without curly quotes giving you away.
153
+
154
+
You can run it from Vim, VS Code in Vim mode, or as a pre-commit. Use it for email, blog posts, whatever. Ignore the naysayers - this is _real-world convenience._
122
155
123
156
See [cleanup-text.md](docs/cleanup-text.md) for deeper dives and arcane options.
124
157
@@ -129,15 +162,28 @@ See [cleanup-text.md](docs/cleanup-text.md) for deeper dives and arcane options.
129
162
130
163
## What's New / What's Cool
131
164
132
-
-**Vaporizes invisible Unicode (unless you tell it not to)**
165
+
### CodexExorcism Release (Sept 2025)
166
+
167
+
Exorcise your code from VS Code/Codex’s funky Unicode artifacts (NBSPs, bidi controls, smart quotes).
168
+
169
+
-**Safer EOF handling in VS Code filter mode**
170
+
-**Normalizes more sneaky Codex/AI fingerprints**
171
+
-**Ellipsis Eradication**
172
+
173
+
### Previous Releases
174
+
133
175
-**Normalizes EM/EN dashes to true ASCII - no more AI " - " nonsense**
134
176
-**Wipes AI "tells," watermarks, and digital fingerprints**
135
177
-**Fixes trailing whitespace, normalizes newlines, burns the digital junk**
136
178
-**Portable (Python 3.7+), cross-platform**
137
179
-**Integrated macOS Shortcut for right-click cleaning in Finder**
138
180
-**Can be used in CI/CD - but also by normal humans, not just pipeline freaks**
139
181
140
-
> *Fun fact*: Even Python will execute code with "curly quotes." Your IDE, email client, and browser all sneak these in. UnicodeFix hunts them down and torches them.
182
+
> *Fun fact*: Even Python will execute code with "curly quotes." Your IDE, email client, and browser all sneak these in. UnicodeFix hunts them down and torches them, ...so your coding homework looks *lovingly hand-crafted* at 4:37 a.m., rather than LLM spawn.
183
+
184
+
### Keep It Fresh!
185
+
186
+
Pull requests/issues always welcome - especially if your AI friend slipped a new weird Unicode gremlin past me, I found a few more while preparing this release too...🙄
141
187
142
188
---
143
189
@@ -147,7 +193,7 @@ UnicodeFix ships with a macOS Shortcut for direct Finder integration.
147
193
148
194
Right-click files, pick a Quick Action, and - bam - no terminal required.
149
195
150
-
### To add the Shortcut:
196
+
### To add the Shortcut
151
197
152
198
1. Open the **Shortcuts** app.
153
199
2. Choose `File -> Import`.
@@ -203,14 +249,13 @@ Pull requests with attitude, creativity, and clean diffs appreciated.
203
249
204
250
## Support This and Other Projects
205
251
206
-
If UnicodeFix (or my other projects) saved your bacon or made you smile,
207
-
please consider fueling my caffeine habit and indie dev obsession:
252
+
If UnicodeFix (or my other projects) saved your bacon or made you smile, please consider fueling my caffeine habit and indie dev obsession...
208
253
209
254
-[Patreon](https://patreon.com/unixwzrd)
210
255
-[Ko-Fi](https://ko-fi.com/unixwzrd)
211
256
-[Buy Me a Coffee](https://buymeacoffee.com/unixwzrd)
212
257
213
-
One coffee = one more tool released to the wild.
258
+
Quite a bit of effort goes into preparing these releases. *One coffee = one more tool released to the wild...*🤔
0 commit comments