-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Validation and Regex Compatibility Issues in python-string-utils
Summary
Several validation functions appear to reject inputs that are commonly accepted in standard formats. Property-based and targeted tests reveal at least three distinct issues:
- Scientific notation support in
is_number/is_decimalis incomplete (rejects uppercaseEand negative exponents) - URL port parsing is overly restrictive (rejects single-digit ports)
- Pangram detection is case-sensitive (fails uppercase-only pangrams)
Affected Code
- Scientific notation regex:
string_utils/_regex.py:7 - URL regex port segment:
string_utils/_regex.py:14 - Pangram implementation:
string_utils/validation.py:510-514 - Number validation:
string_utils/validation.py:135-138 - Decimal validation:
string_utils/validation.py:159-172
Environment
- OS: Windows
- Python: 3.10
- Command:
python run_added_tests.py
Expected Behavior
is_numberandis_decimalshould accept scientific notation using both lowercase and uppercaseE, and should allow signed exponents (e.g.,1e-3,1E+3,1.5e-3).is_urlshould accept 1–5 digit port numbers (standard range 0–65535), e.g.,http://localhost:8.is_pangramshould be case-insensitive (uppercase-only pangrams should pass).
Actual Behavior
- Scientific notation:
is_number('1E3')→ Falseis_number('1e-3')→ Falseis_decimal('1.5e-3')→ False
- URL port:
is_url('http://localhost:8')→ Falseis_url('http://127.0.0.1:7')→ False
- Pangram:
is_pangram('THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG')→ False
Code References
- Scientific notation regex currently:
string_utils/_regex.py:7NUMBER_RE = re.compile(r'^([+\-]?)((\d+)(\.\d+)?(e\d+)?|\.\d+)$')- Limitations:
- Only lowercase
e - Exponent requires digits only (
\d+), no optional sign
- Only lowercase
- URL port segment:
string_utils/_regex.py:14(:\d{2,})?requires at least two digits for the port
- Pangram:
string_utils/validation.py:510-514- Compares set of characters to
string.ascii_lowercasewithout case normalization
- Compares set of characters to
Suggested Fixes
is_number/is_decimal:- Update
NUMBER_REto accept botheandE, and an optional sign before exponent digits, e.g.:- Allow pattern segment like
[eE][+\-]?\d+
- Allow pattern segment like
- Ensure downstream checks (
is_integer,is_decimal) keep consistent semantics when scientific notation is used.
- Update
is_url:- Relax port segment to
(:\d{1,5})?and optionally validate numeric range (0–65535) outside the regex if desired.
- Relax port segment to
is_pangram:- Normalize input to lowercase (or use a case-insensitive comparison) before computing set inclusion.
Metadata
Metadata
Assignees
Labels
No labels