Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
63e108c
Update configuration.rst
FazeelUsmani Nov 7, 2025
caae7eb
Add linkcheck_ignore_case config option
FazeelUsmani Nov 7, 2025
9e6dd40
Update i18n.py
FazeelUsmani Nov 7, 2025
eccd6d7
fixed the failing test test_numfig_disabled_warn
FazeelUsmani Nov 7, 2025
6300483
Enable case-insensitive URL and anchor checking for linkcheck builder
FazeelUsmani Nov 7, 2025
b61366c
strip ANSI color codes from stderr before assertion
FazeelUsmani Nov 7, 2025
7ea45c6
fixed the failing test test_connect_to_selfsigned_fails
FazeelUsmani Nov 7, 2025
99a5dc0
Update test_build_linkcheck.py
FazeelUsmani Nov 7, 2025
f99651f
Merge branch 'master' into linkcheck_case_insensitive
FazeelUsmani Nov 10, 2025
ac12d63
Update linkcheck.py
FazeelUsmani Nov 11, 2025
1a0d9ed
Update test_build_linkcheck.py
FazeelUsmani Nov 11, 2025
d115b1e
Update test_build_linkcheck.py
FazeelUsmani Nov 11, 2025
0075419
fix ruff check linkcheck.py
FazeelUsmani Nov 11, 2025
4eceef2
fix ruff check test_build_linkcheck.py
FazeelUsmani Nov 11, 2025
e772df9
Update configuration.rst
FazeelUsmani Nov 11, 2025
14ded5b
Update configuration.rst
FazeelUsmani Nov 11, 2025
386d4ac
Update configuration.rst
FazeelUsmani Nov 11, 2025
53a47e3
Update doc/usage/configuration.rst
FazeelUsmani Nov 12, 2025
3e545f3
Update i18n.py (reert \)
FazeelUsmani Nov 12, 2025
d9940da
Use .casefold() for case-insensitive URL comparison
FazeelUsmani Nov 12, 2025
322fcf5
Update test_build_linkcheck.py (revert)
FazeelUsmani Nov 12, 2025
cfcbef2
Update test_build_linkcheck.py (revert)
FazeelUsmani Nov 12, 2025
2c4567d
restore original pytest markers
FazeelUsmani Nov 12, 2025
c18d573
Removed the duplicate @pytest.mark.sphinx
FazeelUsmani Nov 12, 2025
07b1795
Removed test_linkcheck_anchors_remain_case_sensitive
FazeelUsmani Nov 12, 2025
bc8fa7c
Rename linkcheck_ignore_case to linkcheck_case_insensitive and update…
FazeelUsmani Nov 13, 2025
029a720
Fix ruff format check
FazeelUsmani Nov 13, 2025
539adaa
remove unused code paths
FazeelUsmani Nov 17, 2025
ae5708f
Merge branch 'master' into linkcheck_case_insensitive
FazeelUsmani Nov 17, 2025
66ae54d
Remove unused test parameter from numfig test
FazeelUsmani Nov 17, 2025
5bc9f2d
Tests: Add complete coverage for linkcheck case sensitivity tests
FazeelUsmani Nov 18, 2025
eaa1caa
Refactor linkcheck case sensitivity: rename config and fix fragment h…
FazeelUsmani Nov 18, 2025
57e8b3c
Improve formatting and update config value handling
FazeelUsmani Nov 18, 2025
5dffff4
Update tests/test_builders/test_build_linkcheck.py
FazeelUsmani Nov 18, 2025
5e08ab3
Remove deprecated linkcheck_case_insensitive config handling
FazeelUsmani Nov 18, 2025
45cf720
Merge branch 'linkcheck_case_insensitive' of github.com:FazeelUsmani/…
FazeelUsmani Nov 18, 2025
06663cf
Refactor linkcheck tests: rename handler for case sensitivity and sim…
FazeelUsmani Nov 18, 2025
5615ffc
Add support for case-insensitive URL checking in linkcheck builder
FazeelUsmani Nov 18, 2025
842b756
restore @pytest.mark.test_params and update documentation
FazeelUsmani Nov 19, 2025
1fe4293
efactor linkcheck case sensitivity tests with dynamic path handler
FazeelUsmani Nov 20, 2025
8c7648b
"Update test document with path1 and path2 for case sensitivity tests
FazeelUsmani Nov 20, 2025
d95224b
Apply ruff formatting
FazeelUsmani Nov 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ Features added
* #13439: linkcheck: Permit warning on every redirect with
``linkcheck_allowed_redirects = {}``.
Patch by Adam Turner and James Addison.
* #14046: linkcheck: Add :confval:`linkcheck_case_insensitive` configuration to
allow case-insensitive URL comparison for specific URL patterns.
This is useful for links to websites that normalise URL casing (for example,
GitHub) or case-insensitive servers.
Patch by Fazeel Usmani.
* #13497: Support C domain objects in the table of contents.
* #13500: LaTeX: add support for ``fontawesome6`` package.
Patch by Jean-François B.
Expand Down
32 changes: 32 additions & 0 deletions doc/usage/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3813,6 +3813,38 @@ and the number of workers to use.

.. versionadded:: 7.3

.. confval:: linkcheck_case_insensitive
:type: :code-py:`list` of :code-py:`str`
:default: :code-py:`[]`

A list of regular expressions that match URLs for which the *linkcheck*
builder should perform case-insensitive comparisons. This is useful for
links to websites that normalise URL casing (for example, GitHub) or
servers that are case-insensitive (for example, Windows-based servers).

By default, *linkcheck* requires the destination URL to match the
documented URL case-sensitively. For example, a link to
``http://example.com/PATH`` that redirects to ``http://example.com/path``
will be reported as ``redirected``.

If the URL matches a pattern in this list, such redirects will instead be
reported as ``working``.

For example, to treat all GitHub URLs as case-insensitive:

.. code-block:: python

linkcheck_case_insensitive = [
r'https://github\.com/.*',
]

.. note::

HTML anchor checking is always case-sensitive and is not affected by
this setting.

.. versionadded:: 8.2

.. confval:: linkcheck_rate_limit_timeout
:type: :code-py:`int`
:default: :code-py:`300`
Expand Down
29 changes: 28 additions & 1 deletion sphinx/builders/linkcheck.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,9 @@ def __init__(
self.user_agent = config.user_agent
self.tls_verify = config.tls_verify
self.tls_cacerts = config.tls_cacerts
self.case_insensitive_patterns: list[re.Pattern[str]] = list(
map(re.compile, config.linkcheck_case_insensitive)
)

self._session = requests._Session(
_ignored_redirects=tuple(map(re.compile, config.linkcheck_ignore))
Expand Down Expand Up @@ -629,8 +632,29 @@ def _check_uri(self, uri: str, hyperlink: Hyperlink) -> _URIProperties:
netloc = urlsplit(req_url).netloc
self.rate_limits.pop(netloc, None)

# Check if URL should be compared case-insensitively based on patterns
is_case_insensitive = any(
pattern.match(req_url) for pattern in self.case_insensitive_patterns
)

# Compare URLs, optionally case-insensitively
def _normalise_url(url: str) -> str:
"""Reduces a URL to a normal/equality-comparable form."""
normalised_url = url.rstrip('/')
if is_case_insensitive:
# Only casefold the URL before the fragment; fragments are case-sensitive
if '#' in normalised_url:
url_part, fragment = normalised_url.split('#', 1)
normalised_url = url_part.casefold() + '#' + fragment
else:
normalised_url = normalised_url.casefold()
return normalised_url

normalised_request_url = _normalise_url(req_url)
normalised_response_url = _normalise_url(response_url)

if (
(response_url.rstrip('/') == req_url.rstrip('/'))
normalised_request_url == normalised_response_url
or _allowed_redirect(req_url, response_url, self.allowed_redirects)
): # fmt: skip
return _Status.WORKING, '', 0
Expand Down Expand Up @@ -816,6 +840,9 @@ def setup(app: Sphinx) -> ExtensionMetadata:
app.add_config_value(
'linkcheck_report_timeouts_as_broken', False, '', types=frozenset({bool})
)
app.add_config_value(
'linkcheck_case_insensitive', [], '', types=frozenset({list, tuple})
)

app.add_event('linkcheck-process-uri')

Expand Down
1 change: 1 addition & 0 deletions tests/roots/test-linkcheck-case-check/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Empty config for linkcheck case sensitivity tests
3 changes: 3 additions & 0 deletions tests/roots/test-linkcheck-case-check/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
`path1 <http://localhost:7777/path1>`_

`path2 <http://localhost:7777/path2>`_
116 changes: 116 additions & 0 deletions tests/test_builders/test_build_linkcheck.py
Original file line number Diff line number Diff line change
Expand Up @@ -1439,3 +1439,119 @@ def test_linkcheck_exclude_documents(app: SphinxTestApp) -> None:
'uri': 'https://www.sphinx-doc.org/this-is-another-broken-link',
'info': 'br0ken_link matched br[0-9]ken_link from linkcheck_exclude_documents',
} in content


class CapitalisePathHandler(BaseHTTPRequestHandler):
"""Test server that capitalises URL paths via redirects."""

protocol_version = 'HTTP/1.1'

def do_HEAD(self):
# Use same logic as GET but don't send body
if self.path.startswith('/') and len(self.path) > 1 and self.path[1:].islower():
# Redirect lowercase paths to capitalized versions
self.send_response(301, 'Moved Permanently')
self.send_header('Location', '/' + self.path[1:].capitalize())
self.send_header('Content-Length', '0')
self.end_headers()
elif (
self.path.startswith('/')
and len(self.path) > 1
and self.path[1].isupper()
and self.path[2:].islower()
):
# Serve capitalized paths
self.send_response(200, 'OK')
self.send_header('Content-Length', '0')
self.end_headers()
else:
self.send_response(404, 'Not Found')
self.send_header('Content-Length', '0')
self.end_headers()

def do_GET(self):
if self.path.startswith('/') and len(self.path) > 1 and self.path[1:].islower():
# Redirect lowercase paths to capitalized versions
self.send_response(301, 'Moved Permanently')
self.send_header('Location', '/' + self.path[1:].capitalize())
self.send_header('Content-Length', '0')
self.end_headers()
elif (
self.path.startswith('/')
and len(self.path) > 1
and self.path[1].isupper()
and self.path[2:].islower()
):
# Serve capitalized paths
content = b'ok\n\n'
self.send_response(200, 'OK')
self.send_header('Content-Length', str(len(content)))
self.end_headers()
self.wfile.write(content)
else:
self.send_response(404, 'Not Found')
self.send_header('Content-Length', '0')
self.end_headers()


@pytest.mark.sphinx(
'linkcheck',
testroot='linkcheck-case-check',
freshenv=True,
)
def test_linkcheck_case_sensitive(app: SphinxTestApp) -> None:
"""Test that case-sensitive checking is the default behavior."""
with serve_application(app, CapitalisePathHandler) as address:
app.build()

content = (app.outdir / 'output.json').read_text(encoding='utf8')
rows = [json.loads(x) for x in content.splitlines()]
rowsby = {row['uri']: row for row in rows}

# With case-sensitive checking (default), URLs that redirect to different case
# should be marked as redirected
assert rowsby[f'http://{address}/path1']['status'] == 'redirected'
assert rowsby[f'http://{address}/path2']['status'] == 'redirected'


@pytest.mark.sphinx(
'linkcheck',
testroot='linkcheck-case-check',
freshenv=True,
confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/.*']},
)
def test_linkcheck_case_insensitive(app: SphinxTestApp) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I think it would be possible to merge the two test cases into a single def test_linkcheck_case_sensitivity now that we can pattern-match different URL patterns.

(the code for the two of them is very similar currently, so perhaps the end result would be neater/smaller by combining them)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored merging the tests, but found that keeping them separate is actually clearer and simpler. Merging them would require either:

  1. Having two different URLs in the test document with hardcoded ports (which don't match the dynamic test
    server port)
  2. More complex test logic to construct different URLs

Since you mentioned this was optional, I've kept the two separate tests as they clearly demonstrate:

  1. test_linkcheck_case_sensitive: Default behavior (no patterns configured)
  2. test_linkcheck_case_insensitive: Pattern-based behavior (with a specific pattern)

This makes the tests easier to understand and maintain. Please let me know if you still prefer them to be merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable to me, yep - thanks! One detail I'd like to ask about: with approach (2), was the more complex in the CapitalisePathHandler? (that's what I'd guess, but would like to double-check)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the complexity wasn't in the CapitalisePathHandler - that part would stay the same (it just redirects /path/Path for any request).

The complexity would be in the test logic itself. To test both behaviors in one test, we'd need:

  1. Two different URLs in the test RST document (e.g., localhost and 127.0.0.1)
  2. The pattern configured to match only one of them (e.g., only 127.0.0.1)
  3. Test assertions to verify different behavior for each URL

The issue is the RST document has hardcoded ports (e.g., http://localhost:7777/path), but the test server uses a dynamic port. So we'd need logic to construct the correct URLs with the dynamic port in the test assertions, which felt more complex than just having two focused tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok; yep, good observation about the dynamic port allocation for the test webserver vs the static port in the documentation.

And also point taken/agreed about the single /path -> /Path transform offered by the handler.

We could add more path transforms to the handler.. could that help? (I think it might do, but I haven't experimented with it in code yet)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding more path transforms to the handler could definitely help!

Here's what I'm thinking:

  class CapitalisePathHandler(BaseHTTPRequestHandler):
      """Test server that capitalises URL paths via redirects."""

      protocol_version = 'HTTP/1.1'

      PATH_REDIRECTS = {
          '/path1': '/Path1',
          '/path2': '/Path2',
      }

      def do_GET(self):
          if self.path in self.PATH_REDIRECTS:
              self.send_response(301, 'Moved Permanently')
              self.send_header('Location', self.PATH_REDIRECTS[self.path])
              self.send_header('Content-Length', '0')
              self.end_headers()
          elif self.path in self.PATH_REDIRECTS.values():
              content = b'ok\n\n'
              self.send_response(200, 'OK')
              self.send_header('Content-Length', str(len(content)))
              self.end_headers()
              self.wfile.write(content)
          else:
              self.send_response(404, 'Not Found')
              self.send_header('Content-Length', '0')
              self.end_headers()

Then the test RST could have:
path1 <http://localhost:7777/path1>_
path2 <http://localhost:7777/path2>_

And we configure the pattern to match only path1:
confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/path1']}

This way we can assert:

  • path1 → working (case-insensitive applies)
  • path2 → redirected (case-sensitive applies)

The dynamic port issue is actually not a problem for assertions since we use the actual address variable there. The hardcoded port in the RST only matters for the pattern matching, which this approach handles cleanly.

Want me to implement this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ok! Let's try that, but with a change: I'm not too keen on the hardcoded redirect paths, so perhaps we could instead test for self.path.islower(), and when it is, then redirect the client to self.path.capitalize(). I think that'd help to make the resulting code even more concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've implemented your suggestion with the dynamic path detection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you - there may be a couple of cleanups possible:

  • We have both do_GET and do_HEAD implemented again - I think we should apply the same simplification as previously and remove one of those.
  • There are still two test cases, where I think we could have a single test_linkcheck_case_sensitivity that handles both case sensitive and non-case-sensitive variants.

I've realized that my suggestion to use self.path.capitalize() isn't applicable as-is.. sorry about that - perhaps we could use self.path.upper() instead, though.

"""Test that URLs matching linkcheck_case_insensitive patterns ignore case differences."""
with serve_application(app, CapitalisePathHandler) as address:
app.build()

content = (app.outdir / 'output.json').read_text(encoding='utf8')
rows = [json.loads(x) for x in content.splitlines()]
rowsby = {row['uri']: row for row in rows}

# With case-insensitive pattern matching, URLs that differ only in case
# should be marked as working
assert rowsby[f'http://{address}/path1']['status'] == 'working'
assert rowsby[f'http://{address}/path2']['status'] == 'working'


@pytest.mark.sphinx(
'linkcheck',
testroot='linkcheck-case-check',
freshenv=True,
confoverrides={'linkcheck_case_insensitive': [r'http://localhost:\d+/path1']},
)
def test_linkcheck_mixed_case_sensitivity(app: SphinxTestApp) -> None:
"""Test both case-sensitive and case-insensitive checking in one test."""
with serve_application(app, CapitalisePathHandler) as address:
app.build()

content = (app.outdir / 'output.json').read_text(encoding='utf8')
rows = [json.loads(x) for x in content.splitlines()]
rowsby = {row['uri']: row for row in rows}

# path1 matches case-insensitive pattern → should be 'working'
assert rowsby[f'http://{address}/path1']['status'] == 'working'

# path2 doesn't match pattern → should be 'redirected'
assert rowsby[f'http://{address}/path2']['status'] == 'redirected'
Loading