Skip to content

Commit abb84db

Browse files
committed
20250722_00 Enough of of Your AI Nonsense Edition
- Major update, new options - Smarter removal of Unicode and conversion - More coding artifacts removed - less lint
1 parent 7c0b91e commit abb84db

15 files changed

+666
-101
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,3 +256,5 @@ tmp/
256256

257257
bashexp*.txt
258258

259+
backup/
260+
test_output/

CHANGELOG.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,32 @@
11
# Changelog for UnicodeFix
22

3+
## 2025-07-22
34

4-
## 2025-04-27 20250427_01-update
5+
**Major Release – “Enough of Your AI Nonsense” Edition**
6+
7+
- **CLI Supercharged:** Added new power flags:
8+
`-i` / `--invisible` (preserve zero-width/invisible Unicode)
9+
`-n` / `--no-newline` (suppress final newline at EOF)
10+
`-o` / `--output` (custom output file or STDOUT)
11+
`-t` / `--temp` (safe in-place cleaning)
12+
`-p` / `--preserve-tmp` (backup your .tmp files if you’re paranoid)
13+
- **AI Artifact Killer:** Cranked up removal of invisible Unicode, “AI tells,” EM/EN dashes, curly/smart quotes, and digital fingerprints from text, code, and prose.
14+
- **Cleaner Output:** Output files now use `.clean` before the extension for extra safety.
15+
- **Help & Error Output:** Help and error messages are clearer, less cryptic, and actually readable.
16+
- **Epic Test Suite:** All-new `test/test_all.sh` script automates batch tests, diffs, word counts, and deep-clean scenarios—review everything in `test_output/` before you ship or commit.
17+
- **Docs & Best Practices:** README and docs overhauled with real-world examples, pro tips, and fresh install/usage details (plus a *lot* more attitude).
18+
- **CI/CD Ready:** Use in your pre-commit, CI pipeline, or just blast through homework/AI-proofreading artifacts for fun.
19+
- **Because I got tired of looking at garbage code.**
20+
21+
*If you’re tired of code and docs that look like they were written by a bot, this release is for you.*
522

23+
## 2025-04-27 20250427_01-update
624
- Update README
725
- Update cleanup-text.py to handle trailing whitespace
826
- Whitespace on empty lines (newline preserved)
927

1028
## 2025-04-26 20250427_00-release
29+
- Added STDIO pipe handling as a filter
1130

31+
## 2025-04-26
1232
- Initial release
13-
- Added STDIO pipe handling as a filter

README.md

Lines changed: 136 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,157 +1,230 @@
11
# UnicodeFix
22

3-
UnicodeFix normalizes problematic Unicode artifacts into clean ASCII equivalents.
4-
5-
This project was created to address the increasing frequency of invisible and typographic Unicode characters causing issues in code, configuration files, AI detection, and document processing.
6-
7-
**This is an early release. Further polishing and enhancements will follow.**
3+
![UnicodeFix Hero Image](docs/controlling-unicode.png)
84

95
- [UnicodeFix](#unicodefix)
6+
- [**Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code \& docs squeaky clean for real humans.**](#finally---a-tool-that-blasts-ai-fingerprints-torches-those-infuriating-smart-quotes-and-leaves-your-code--docs-squeaky-clean-for-real-humans)
7+
- [Why Is This Happening?](#why-is-this-happening)
108
- [Installation](#installation)
119
- [Usage](#usage)
10+
- [Brief Examples](#brief-examples)
1211
- [Pipe / Filter (STDIN to STDOUT)](#pipe--filter-stdin-to-stdout)
12+
- [Batch Clean](#batch-clean)
13+
- [In-Place (Safe) Clean](#in-place-safe-clean)
14+
- [Preserve Temp File for Backup](#preserve-temp-file-for-backup)
1315
- [Using in vi/vim/macvim](#using-in-vivimmacvim)
16+
- [What's New / What's Cool](#whats-new--whats-cool)
1417
- [Shortcut for macOS](#shortcut-for-macos)
1518
- [To add the Shortcut:](#to-add-the-shortcut)
1619
- [What's in This Repository](#whats-in-this-repository)
20+
- [Testing and CI/CD](#testing-and-cicd)
1721
- [Contributing](#contributing)
1822
- [Support This and Other Projects](#support-this-and-other-projects)
1923
- [Changelog](#changelog)
20-
- [2025-04-27](#2025-04-27)
21-
- [2025-04-26](#2025-04-26)
2224
- [License](#license)
2325

26+
---
27+
28+
### **Finally - a tool that blasts AI fingerprints, torches those infuriating smart quotes, and leaves your code & docs squeaky clean for real humans.**
29+
30+
Ever open up a file and instantly know it came from ChatGPT, Copilot, or one of their AI cousins? (Yeah, so can everyone else now.)
31+
UnicodeFix vaporizes all the weird dashes, curly quotes, invisible space ninjas, and digital "tells" that out you as an AI user - or just make your stuff fail linters and code reviews.
32+
33+
**Whether you're a student, a dev, or an open-source rebel: this is your "eraser for AI breadcrumbs."**
34+
35+
_Yes, it helps students cheat on their homework._
36+
It also makes blog posts and AI-proofed emails look like you sweated over every character.
37+
Nearly a thousand people have grabbed it. Nobody's bought me a coffee yet, but hey… there's a first time for everything.
38+
39+
---
40+
41+
## Why Is This Happening?
42+
43+
Some folks think all this Unicode cruft is a side-effect of generative AI's training data. Others believe it's a deliberate move - baked-in "watermarks" to ID machine-generated text. Either way: these artifacts leave a trail. UnicodeFix wipes it.
44+
45+
---
46+
2447
## Installation
2548

2649
Clone the repository and run the setup script:
2750

28-
```bash
51+
```
2952
git clone https://github.com/unixwzrd/UnicodeFix.git
3053
cd UnicodeFix
3154
bash setup.sh
3255
```
3356

34-
The \`setup.sh\` script:
57+
The `setup.sh` script:
58+
- Creates a Python virtual environment just for UnicodeFix
59+
- Installs dependencies
60+
- Adds handy startup config to your `.bashrc` for one-command usage
3561

36-
- Creates a dedicated Python virtual environment
37-
- Installs required dependencies
38-
- Adds startup configuration to your \`.bashrc\` for easier usage
62+
See [setup.sh](setup.sh) for the nitty-gritty.
3963

40-
You can review [setup.sh](setup.sh) to see exactly what is modified.
64+
For serious environment nerds: [VenvUtil](https://github.com/unixwzrd/venvutil) is my full-featured Python env toolkit.
4165

42-
I also maintain a broader toolset for virtual environment management here: [VenvUtil](https://github.com/unixwzrd/venvutil), which may be of interest for more advanced users.
66+
---
4367

4468
## Usage
4569

4670
Once installed and activated:
4771

48-
```bash
49-
(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ python bin/cleanup-text.py --help
50-
usage: cleanup-text.py [-h] [infile ...]
72+
```
73+
(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ cleanup-text --help
74+
75+
usage: cleanup-text [-h] [-i] [-o OUTPUT] [-t] [-p] [-n] [infile ...]
5176
52-
Clean Unicode quirks from text.
77+
Clean Unicode quirks from text. If no input files are given, reads from STDIN and writes to STDOUT (filter mode). If input files are given, creates cleaned files with .clean before the extension (e.g., foo.txt -> foo.clean.txt). Use -o - to force output to STDOUT for all input files, or -o <file> to specify a single output file
78+
(only with one input file).
5379
5480
positional arguments:
5581
infile Input file(s)
5682
5783
options:
58-
-h, --help Show this help message and exit
84+
-h, --help show this help message and exit
85+
-i, --invisible Preserve invisible Unicode characters (zero-width, non-breaking, etc.)
86+
-o OUTPUT, --output OUTPUT
87+
Output file name, or '-' for STDOUT. Only valid with one input file, or use '-' for STDOUT with multiple files.
88+
-t, --temp In-place cleaning: Move each input file to .tmp, clean it, write cleaned output to original name, and delete .tmp after success.
89+
-p, --preserve-tmp With -t, preserve the .tmp file after cleaning (do not delete it). Useful for backup or manual recovery.
90+
-n, --no-newline Do not add a newline at the end of the output file (suppress final newline).
5991
```
6092

93+
## Brief Examples
94+
6195
### Pipe / Filter (STDIN to STDOUT)
96+
```
97+
cat file.txt | cleanup-text > cleaned.txt
98+
```
6299

63-
UnicodeFix can operate as a standard UNIX pipe:
100+
### Batch Clean
101+
```
102+
cleanup-text *.txt
103+
```
64104

65-
```bash
66-
cat file.txt | cleanup-text > cleaned.txt
105+
### In-Place (Safe) Clean
106+
```
107+
cleanup-text -t myfile.txt
67108
```
68109

69-
If no input file arguments are given, it automatically reads from standard input and writes to standard output.
110+
### Preserve Temp File for Backup
111+
```
112+
cleanup-text -t -p myfile.txt
113+
```
70114

71115
### Using in vi/vim/macvim
72116

73-
You can run UnicodeFix as a filter within vi/vim/macvim:
74-
75-
```vim
117+
```
76118
:%!cleanup-text
77119
```
78120

79-
This command rewrites the entire buffer with cleaned text.
121+
You can run it from Vim, VS Code in Vim mode, or as a pre-commit. Use it for email, blog posts, whatever. Ignore the naysayers - this is *real-world convenience.*
122+
123+
See [cleanup-text.md](docs/cleanup-text.md) for deeper dives and arcane options.
124+
125+
- **Make sure your Python environment is activated** before launching your editor, or wrap it in a shell script that does it for you.
126+
- Adjust your editor's shell settings as needed for best results.
80127

81-
**Note**:
82-
- Ensure your virtual environment is activated before launching your editor, or
83-
- Use a shell wrapper that sources your \`.bashrc\` and activates the environment automatically.
128+
---
84129

85-
Depending on how you manage virtual environments, you may need to adjust your editor’s shell invocation settings.
130+
## What's New / What's Cool
131+
132+
- **Vaporizes invisible Unicode (unless you tell it not to)**
133+
- **Normalizes EM/EN dashes to true ASCII - no more AI " - " nonsense**
134+
- **Wipes AI "tells," watermarks, and digital fingerprints**
135+
- **Fixes trailing whitespace, normalizes newlines, burns the digital junk**
136+
- **Portable (Python 3.7+), cross-platform**
137+
- **Integrated macOS Shortcut for right-click cleaning in Finder**
138+
- **Can be used in CI/CD - but also by normal humans, not just pipeline freaks**
139+
140+
> *Fun fact*: Even Python will execute code with "curly quotes." Your IDE, email client, and browser all sneak these in. UnicodeFix hunts them down and torches them.
141+
142+
---
86143

87144
## Shortcut for macOS
88145

89-
UnicodeFix includes a macOS Shortcut for direct Finder integration.
146+
UnicodeFix ships with a macOS Shortcut for direct Finder integration.
90147

91-
You can right-click one or more files and select a Quick Action to clean Unicode quirks without opening a terminal.
148+
Right-click files, pick a Quick Action, and - bam - no terminal required.
92149

93150
### To add the Shortcut:
94151

95152
1. Open the **Shortcuts** app.
96-
2. Navigate to \`File -> Import\`.
97-
153+
2. Choose `File -> Import`.
98154
![Shortcuts App Menu](docs/Screenshot%202025-04-25%20at%2005.50.57.png)
155+
3. Select the Shortcut in `macOS/Strip Unicode.shortcut`.
156+
![Import Shortcut](docs/Screenshot%202025-04-25%20at%2005.51.54.png)
157+
4. Edit it to point to your local `cleanup-text.py`.
158+
![Edit Shortcut Script Path](docs/Screenshot%202025-04-25%20at%2005.07.47.png)
159+
5. Relaunch Finder (`Cmd+Opt+Esc` → select Finder → Relaunch) if needed.
160+
6. After setup, right-click files, choose `Quick Actions`, select `Strip Unicode`.
161+
![Select Shortcut File](docs/Screenshot%202025-04-25%20at%2005.47.51.png)
99162

100-
3. Select the Shortcut file located in \`macOS/Strip Unicode.shortcut\`.
163+
---
101164

102-
![Import Shortcut](docs/Screenshot%202025-04-25%20at%2005.51.54.png)
165+
## What's in This Repository
103166

104-
4. Edit the Shortcut to point to your local installation of \`cleanup-text.py\`.
167+
- [bin/cleanup-text.py](bin/cleanup-text.py) - Main cleaning script
168+
- [bin/cleanup-text](bin/cleanup-text) - Symlink for CLI usage
169+
- [setup.sh](setup.sh) - Easy setup and env configuration
170+
- [requirements.txt](requirements.txt) - Python dependencies
171+
- [macOS/](macOS/) - Shortcuts, scripts for Finder
172+
- [data/](data/) - Example test files
173+
- [test/](test/) - Automated test suite for all features/edge cases
174+
- [docs/](docs/) - Documentation and screenshots
175+
- [LICENSE](LICENSE)
176+
- [README.md](README.md) - This file
105177

106-
![Edit Shortcut Script Path](docs/Screenshot%202025-04-25%20at%2005.07.47.png)
178+
---
107179

108-
5. You may need to relaunch Finder (\`Command+Option+Esc\` → Select Finder → Relaunch).
180+
## Testing and CI/CD
109181

110-
6. After setup, right-click selected files, choose \`Quick Actions\`, and select \`Strip Unicode\`.
182+
UnicodeFix comes with a full, automated test suite:
111183

112-
![Select Shortcut File](docs/Screenshot%202025-04-25%20at%2005.47.51.png)
184+
- Runs every feature & scenario on files in `data/`
185+
- Outputs to `test_output/` (by scenario, with diffs and word counts)
186+
- Clean up with: `./test/test_all.sh clean`
187+
- Plug into your CI/CD pipeline or just use as a "paranoia check" before shipping anything
113188

114-
## What's in This Repository
189+
**Pro tip:** Run the tests before you merge, publish, or email a "final" version.
190+
191+
See [docs/test-suite.md](docs/test-suite.md) for the deep dive.
115192

116-
- [bin/cleanup-text.py](bin/cleanup-text.py) — Main cleaning script
117-
- [bin/cleanup-text](bin/cleanup-text) — Symlink for command-line usage
118-
- [setup.sh](setup.sh) — Virtual environment setup script
119-
- [requirements.txt](requirements.txt) — Python dependencies
120-
- [macOS/](macOS/) — macOS Shortcut for Finder integration
121-
- [data/](data/) — Example test files with Unicode artifacts
122-
- [docs/](docs/) — Documentation and screenshots
123-
- [LICENSE](LICENSE) — License information
124-
- [README.md](README.md) — This file
193+
---
125194

126195
## Contributing
127196

128-
Feedback, testing, bug reports, and pull requests are welcome.
197+
Feedback, bug reports, and patches welcome.
198+
199+
If you've got a better integration path for your favorite platform, let's make it happen.
200+
Pull requests with attitude, creativity, and clean diffs appreciated.
129201

130-
If you find a better integration path for Linux or Windows platforms, feel free to open an issue or contribute a patch.
202+
---
131203

132204
## Support This and Other Projects
133205

134-
If you find UnicodeFix or my other projects valuable, please consider supporting continued development:
206+
If UnicodeFix (or my other projects) saved your bacon or made you smile,
207+
please consider fueling my caffeine habit and indie dev obsession:
135208

136209
- [Patreon](https://patreon.com/unixwzrd)
137210
- [Ko-Fi](https://ko-fi.com/unixwzrd)
138211
- [Buy Me a Coffee](https://buymeacoffee.com/unixwzrd)
139212

140-
Thank you for your support.
213+
One coffee = one more tool released to the wild.
214+
215+
Thank you for keeping solo development alive!
216+
217+
---
141218

142219
## Changelog
143220

144-
### 2025-04-27
145-
- Fixed behavior when processing STDIN pipes
146-
- Added trailing whitespace and blank line normalization
147-
- Added shell script wrapper for easier activation from editors
221+
**See [CHANGELOG.md](CHANGELOG.md) for the latest drop.**
148222

149-
### 2025-04-26
150-
- Initial release
223+
---
151224

152225
## License
153226

154-
Copyright 2025
227+
Copyright 2025
155228
[unixwzrd@unixwzrd.ai](mailto:unixwzrd@unixwzrd.ai)
156229

157230
[MIT License](LICENSE)

0 commit comments

Comments
 (0)