-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add linkcheck_case_sensitive configuration option #14046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 36 commits
63e108c
caae7eb
9e6dd40
eccd6d7
6300483
b61366c
7ea45c6
99a5dc0
f99651f
ac12d63
1a0d9ed
d115b1e
0075419
4eceef2
e772df9
14ded5b
386d4ac
53a47e3
3e545f3
d9940da
322fcf5
cfcbef2
2c4567d
c18d573
07b1795
bc8fa7c
029a720
539adaa
ae5708f
66ae54d
5bc9f2d
eaa1caa
57e8b3c
5dffff4
5e08ab3
45cf720
06663cf
5615ffc
842b756
1fe4293
8c7648b
d95224b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # Empty config for linkcheck case sensitivity tests |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| `local server path <http://localhost:7777/path>`_ |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1439,3 +1439,87 @@ def test_linkcheck_exclude_documents(app: SphinxTestApp) -> None: | |||||||||
| 'uri': 'https://www.sphinx-doc.org/this-is-another-broken-link', | ||||||||||
| 'info': 'br0ken_link matched br[0-9]ken_link from linkcheck_exclude_documents', | ||||||||||
| } in content | ||||||||||
|
|
||||||||||
|
|
||||||||||
| class CaseSensitiveHandler(BaseHTTPRequestHandler): | ||||||||||
|
||||||||||
| class CaseSensitiveHandler(BaseHTTPRequestHandler): | |
| class CapitalisePathHandler(BaseHTTPRequestHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it from CaseSensitiveHandler to CapitalisePathHandler and updated the docstring to reflect what it actually does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the do_HEAD code block here, as the base HTTP server will invoke do_GET instead if it's missing.
NB: Let's also apply the Location response header simplification to do_GET when we do that, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've removed the do_HEAD method entirely - the base HTTP server will invoke do_GET instead. I've also simplified the Location header in do_GET to just use /Path instead of the full URL with host.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the containment check is good defensive coding, we don't perform similar checks in the other tests, so for consistency I would recommend:
| lowercase_uri = f'http://{address}/path' | |
| assert lowercase_uri in rowsby, f'Expected {lowercase_uri} to be checked' | |
| assert rowsby[lowercase_uri]['status'] == 'redirected' | |
| assert rowsby[f'http://{address}/path']['status'] == 'redirected' |
(I briefly considered suggesting renaming the variable to something like origin_url, outbound_url or documented_url -- but on balance I think we should simply omit it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified both tests to directly assert on rowsby[f'http://{address}/path']['status'] without the intermediate variable or defensive containment check, matching the style used in other tests in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I think it would be possible to merge the two test cases into a single def test_linkcheck_case_sensitivity now that we can pattern-match different URL patterns.
(the code for the two of them is very similar currently, so perhaps the end result would be neater/smaller by combining them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored merging the tests, but found that keeping them separate is actually clearer and simpler. Merging them would require either:
- Having two different URLs in the test document with hardcoded ports (which don't match the dynamic test
server port) - More complex test logic to construct different URLs
Since you mentioned this was optional, I've kept the two separate tests as they clearly demonstrate:
- test_linkcheck_case_sensitive: Default behavior (no patterns configured)
- test_linkcheck_case_insensitive: Pattern-based behavior (with a specific pattern)
This makes the tests easier to understand and maintain. Please let me know if you still prefer them to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable to me, yep - thanks! One detail I'd like to ask about: with approach (2), was the more complex in the CapitalisePathHandler? (that's what I'd guess, but would like to double-check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the complexity wasn't in the CapitalisePathHandler - that part would stay the same (it just redirects /path → /Path for any request).
The complexity would be in the test logic itself. To test both behaviors in one test, we'd need:
- Two different URLs in the test RST document (e.g.,
localhostand127.0.0.1) - The pattern configured to match only one of them (e.g., only
127.0.0.1) - Test assertions to verify different behavior for each URL
The issue is the RST document has hardcoded ports (e.g., http://localhost:7777/path), but the test server uses a dynamic port. So we'd need logic to construct the correct URLs with the dynamic port in the test assertions, which felt more complex than just having two focused tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok; yep, good observation about the dynamic port allocation for the test webserver vs the static port in the documentation.
And also point taken/agreed about the single /path -> /Path transform offered by the handler.
We could add more path transforms to the handler.. could that help? (I think it might do, but I haven't experimented with it in code yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding more path transforms to the handler could definitely help!
Here's what I'm thinking:
class CapitalisePathHandler(BaseHTTPRequestHandler):
"""Test server that capitalises URL paths via redirects."""
protocol_version = 'HTTP/1.1'
PATH_REDIRECTS = {
'/path1': '/Path1',
'/path2': '/Path2',
}
def do_GET(self):
if self.path in self.PATH_REDIRECTS:
self.send_response(301, 'Moved Permanently')
self.send_header('Location', self.PATH_REDIRECTS[self.path])
self.send_header('Content-Length', '0')
self.end_headers()
elif self.path in self.PATH_REDIRECTS.values():
content = b'ok\n\n'
self.send_response(200, 'OK')
self.send_header('Content-Length', str(len(content)))
self.end_headers()
self.wfile.write(content)
else:
self.send_response(404, 'Not Found')
self.send_header('Content-Length', '0')
self.end_headers()Then the test RST could have:
path1 <http://localhost:7777/path1>_
path2 <http://localhost:7777/path2>_
And we configure the pattern to match only path1:
confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/path1']}
This way we can assert:
- path1 → working (case-insensitive applies)
- path2 → redirected (case-sensitive applies)
The dynamic port issue is actually not a problem for assertions since we use the actual address variable there. The hardcoded port in the RST only matters for the pattern matching, which this approach handles cleanly.
Want me to implement this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ok! Let's try that, but with a change: I'm not too keen on the hardcoded redirect paths, so perhaps we could instead test for self.path.islower(), and when it is, then redirect the client to self.path.capitalize(). I think that'd help to make the resulting code even more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I've implemented your suggestion with the dynamic path detection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you - there may be a couple of cleanups possible:
- We have both
do_GETanddo_HEADimplemented again - I think we should apply the same simplification as previously and remove one of those. - There are still two test cases, where I think we could have a single
test_linkcheck_case_sensitivitythat handles both case sensitive and non-case-sensitive variants.
I've realized that my suggestion to use self.path.capitalize() isn't applicable as-is.. sorry about that - perhaps we could use self.path.upper() instead, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to restore this
pytestmarker (I think you did, but maybe it got removed again somehow).