-
Notifications
You must be signed in to change notification settings - Fork 8.3k
libc: fnmatch: Add to A.5 rule and common #98827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fnmatch() is used by the shell, http server & coverage modules. Given that quite a few subsystems need it, let's add it to the C library common functions, and to the expected functions from the C library in rule A.5, instead of relaying on the corresponding POSIX subsystem option being permanently enabled. Several C libraries already implement it. For those libc which do not, enable the common implementation found in this commit. Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Some of the fnmatch tests didn't match POSIX.
> - zassert_ok(fnmatch("[[?*\\]", "\\", 0));
This pattern includes a back-slash, and because FNM_NOESCAPE was not
specified, it will escape the ], so there isn't actually a range expression
here, nor is there any literal backslash in the pattern. So, this does not
match backslash. Replace this with three tests to check each of these
issues:
> + /* \\ escapes ], no bracket expr, no match */
> + zassert_equal(fnmatch("[[?*\\]", "\\", 0), FNM_NOMATCH);
> + /* \\ unescaped, match */
> + zassert_ok(fnmatch("[[?*\\]", "\\", FNM_NOESCAPE));
> + /* \\ escapes \\, match */
> + zassert_ok(fnmatch("[[?*\\\\]", "\\", 0));
> - zassert_ok(fnmatch("[]?*\\]", "]", 0));
This one has the same issue, except it tries to match ] instead. Again,
because the \ will end up escaping the final ], this is not a range
expression, and so it doesn't match ]. Replace this with three tests to
check each case separately:
> + /* \\ escapes ], no bracket expr, no match */
> + zassert_equal(fnmatch("[]?*\\]", "]", 0), FNM_NOMATCH);
> + /* \\ unescaped, match */
> + zassert_ok(fnmatch("[]?*\\]", "]", FNM_NOESCAPE));
> + /* \\ escapes \\, match */
> + zassert_ok(fnmatch("[]?*\\\\]", "]", 0));
> - /* zassert_ok(fnmatch("*[[:alpha:]]/""*[[:alnum:]]", "a/b",
> - FNM_PATHNAME)); */
> + zassert_ok(fnmatch("*[[:alpha:]]/""*[[:alnum:]]", "a/b",
> + FNM_PATHNAME));
Re-enable this test; it looks good. The "" will be elided by the compiler,
so I'm not sure why it's here.
> - zassert_not_equal(fnmatch("*[![:digit:]]*/[![:d-d]", "a/b",
> - FNM_PATHNAME), 0);
> + zassert_ok(fnmatch("*[![:digit:]]*/[![:d-d]", "a/b", FNM_PATHNAME));
This pattern matches the string:
* - matches zero characters
[![:digit:]] - matches 'a' (it's not a digit)
* - matches zero characters
/ - matches slash (FNM_PATHNAME specified)
[![:d-d] - matches 'b' (not in the set "[:d-d")
> - zassert_not_equal(fnmatch("*[![:digit:]]*/[[:d-d]", "a/[",
> - FNM_PATHNAME), 0);
> + zassert_ok(fnmatch("*[![:digit:]]*/[[:d-d]", "a/[", FNM_PATHNAME));
This pattern matches the string:
* - matches zero characters
[![:digit:]] - matches 'a'
* - matches zero characters
/ - matches / (FNM_PATHNAME is specified)
[[:d-d] - matches [ (in the set "[:d-d")
I've run all of these tests against glibc which agrees with this version,
and also picolibc, which also works with a pending PR that adds support for
character classes, equivalence classes and collating sequences.
Signed-off-by: Keith Packard <keithp@keithp.com>
2390aa5 to
4e1e77c
Compare
Zephyr fnmatch() fails couple of newly added tests so disable them if using our own version. Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
80190cd to
70f5610
Compare
|
| depends on "$(ZEPHYR_TOOLCHAIN_VARIANT)" = "arcmwdt" | ||
| select STATIC_INIT_GNU | ||
| select LIBC_ALLOW_LESS_THAN_64BIT_TIME | ||
| select COMMON_LIBC_FNMATCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| select COMMON_LIBC_FNMATCH |
| depends on "$(ZEPHYR_TOOLCHAIN_VARIANT)" = "iar" | ||
| select COMMON_LIBC_STRNLEN | ||
| select COMMON_LIBC_TIME | ||
| select COMMON_LIBC_FNMATCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| select COMMON_LIBC_FNMATCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please undo the changes to this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the changes to this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the changes to this file
| #include <fnmatch.h> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really only be done after #97855 is merged (although, I guess it is part of that PR already).
| #include <errno.h> | ||
| #include <string.h> | ||
| #include <zephyr/posix/fnmatch.h> | ||
| #include <fnmatch.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really only be done after #97855 is merged (although, I guess it is part of that PR already).
| */ | ||
|
|
||
| #define _GNU_SOURCE /* for FNM_LEADING_DIR in fnmatch.h */ | ||
| #include <fnmatch.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really only be done after #97855 is merged (although, I guess it is part of that PR already).
| zassert_ok(fnmatch("[]?*\\]", "]", 0)); | ||
| if (!IS_ENABLED(CONFIG_COMMON_LIBC_FNMATCH)) { | ||
| /* \\ escapes ], no bracket expr, no match */ | ||
| zassert_equal(fnmatch("[[?*\\]", "\\", 0), FNM_NOMATCH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect.
If FNM_NOESCAPE is not set in flags, a backslash character in pattern followed by any other character shall match that second character in string
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fnmatch.html
In the assertion below, FNM_NOESCAPE is not set, so the \\ sequence in pattern matches the (properly escaped and singular) backslash character \. The function should return successfully. Below, pattern should be interpreted as matching any of [, ?, *, \.
zassert_ok(fnmatch("[[?*\\]", "\\", 0));If Picolibc's implementation does not work here, it has at least one bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler will convert \\ in the source code to a single \ in the resulting string, and that will escape the ] character. I reviewed the musl source code and it appears to have a bug here; it doesn't check to see if the ] is escaped with \. https://git.musl-libc.org/cgit/musl/tree/src/regex/fnmatch.c#n72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at musl's implementation of fnmatch(), just the test data. The original implementation of fnmatch in Zephyr did not include any tests, so the tests were kind of retrofitted.
I think I misinterpreted the pattern, and I can understand why the original author of the test data probably also misinterpreted the pattern due to the wording of the spec.
zassert_ok(fnmatch("[[?*\\]", "\\", 0));So [[?* is pretty clear, but I guess \\] is the part that probably isn't clear. So this will get compiled to char pattern[] = ['\\', ']'], which would seem like an escape of the ending square bracket (so more like "\]"), which would support the comment above \\ escapes ], no bracket expr, no match.
I think I can agree with that now, but it required a retake.
The alternative below provided in this PR seems to be more correct
zassert_ok(fnmatch("[[?*\\\\]", "\\", 0));Ignoring the other options in the bracket expression, [?*, then the bracket expression could be simplified to "[\\\\]", or basically a single escaped backslash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, having both the compiler and the function parsing the same escape sequences is confusing at best.
As to changing the test case, they're both "legitimate" test cases; the original one uncovers a bug in the Musl implementation which is not present in the newlib, picolibc or glibc implementations, so we should keep it. Adding your modified version is probably also reasonable (more tests == more better?).
| /* \\ unescaped, match */ | ||
| zassert_ok(fnmatch("[[?*\\]", "\\", FNM_NOESCAPE)); | ||
| /* \\ escapes \\, match */ | ||
| zassert_ok(fnmatch("[[?*\\\\]", "\\", 0)); | ||
| /* \\ unescaped, match */ | ||
| zassert_ok(fnmatch("[]?*\\]", "]", FNM_NOESCAPE)); | ||
| /* \\ escapes \\, match */ | ||
| zassert_ok(fnmatch("[]?*\\\\]", "]", 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little hesitant to add these, and it isn't clear to me at the moment if they add to coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They test the four cases where musl and glibc differ because musl doesn't allow for an escaped ] in the pattern.
I think https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01 is pretty clear on this point --fnmatch should interpret \ followed by ] as a plain character and not the end of the bracket expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first one makes sense to me
The second one makes sense to me.
The third one (below) is a bit tricky, because [] (empty bracket expressions) are undefined in POSIX.
Further clarification is in
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05
The ( ']' ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial ( '^' ), if any)
So while it might be tempting to interpret []? as an empty bracket expression followed by a "maximally once" operator, the spec says not to do that.
/* \\ unescaped, match */
zassert_ok(fnmatch("[]?*\\]", "]", FNM_NOESCAPE));The last case is similar at the beginning, and of course allows escapes, so the \\ followed by \\ is just an escaped backslash that doesn't match anything in the input.
/* \\ escapes \\, match */
zassert_ok(fnmatch("[]?*\\\\]", "]", 0));So the third and fourth cases make sense to me too.
I would probably also say that another pattern to include in addition to the fourth case where there are 0 flags, would be below
/* \\ escapes \\, match */
zassert_ok(fnmatch("[\\]?*\\\\]", "]", 0));Which just explicitly escapes the first ] in the bracket expression. It isn't needed, but being a bit more explicit about the interpretation of ] would be helpful for any future readers (and probably a link back to the spec would help).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first one makes sense to me The second one makes sense to me.
The third one (below) is a bit tricky,
Tricky, but well defined -- fnmatch("[]?*\\]", "]", FNM_NOESCAPE) matches ] because the bracket expression matches any of ], ?, * or \. The latter because FNM_NOESCAPE means that the \ doesn't get interpreted as an escape for the trailing ].
I would probably also say that another pattern to include in addition to the fourth case where there are 0 flags, would be below
/* \\ escapes \\, match */ zassert_ok(fnmatch("[\\]?*\\\\]", "]", 0));Which just explicitly escapes the first
]in the bracket expression. It isn't needed, but being a bit more explicit about the interpretation of]would be helpful for any future readers (and probably a link back to the spec would help).
Additional annotations for the test cases is probably a good idea. Note that we want to poke the implementation in corner cases, so while I would probably write "[\\]?*\\\\]" to make my intention clear, the spec says that is equivalent to "[]?*\\\\]", so we should test both to make sure either form works.
| /* \\ escapes ], no bracket expr, no match */ | ||
| zassert_equal(fnmatch("[[?*\\]", "\\", 0), FNM_NOMATCH); | ||
| /* \\ escapes ], no bracket expr, no match */ | ||
| zassert_equal(fnmatch("[]?*\\]", "]", 0), FNM_NOMATCH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a little trickier.
Just mid-release, so give me some time to revisit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, these are just more versions of the above test -- because the ] is preceded by a single \, it will be escaped and hence the pattern doesn't actually contain a bracket expression at all.
| zassert_equal(fnmatch("*/*", "a/.b", FNM_PATHNAME | FNM_PERIOD), FNM_NOMATCH); | ||
| zassert_ok(fnmatch("*?*/*", "a/.b", FNM_PERIOD)); | ||
| zassert_ok(fnmatch("*[.]/b", "a./b", FNM_PATHNAME | FNM_PERIOD)); | ||
| /* zassert_ok(fnmatch("*[[:alpha:]]/""*[[:alnum:]]", "a/b", FNM_PATHNAME)); */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not remove this line. It's there intentionally.
We should wait for #97400 to be merged. I've been working with the first-time contributor for over a month to ensure the PR is ready, and I think it would be great to help them to push it past the finish line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This one relies on the library supporting the character class elements of the pattern. Newlib, picolibc 1.8.10 and Zephyr all lack support for these, so we will need to make fixed versions available before this test will pass though.
| /* Disable tests that fail when using Zephyr's fnmatch implementation */ | ||
| if (!IS_ENABLED(CONFIG_POSIX_C_LIB_EXT)) { | ||
| zassert_ok(fnmatch("*[[:alpha:]]/""*[[:alnum:]]", "a/b", FNM_PATHNAME)); | ||
| zassert_ok(fnmatch("*[![:digit:]]*/[![:d-d]", "a/b", FNM_PATHNAME)); | ||
| zassert_ok(fnmatch("*[![:digit:]]*/[[:d-d]", "a/[", FNM_PATHNAME)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These change should be reverted, and the calls should all fail, according to SUSv3. See XCU 2.13.1, XBD 9.3.5
https://git.musl-libc.org/cgit/libc-testsuite/tree/fnmatch.c#n113
I'm just taking the musl libc tester's word for it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first test is pretty clearly correct, although it relies upon having character class support.
The other two tests check for patterns containing invalid character class expressions. Here, I can understand the difference between musl and glibc. Glibc (and picolibc) interpret an invalid character class expression as the characters themselves, so [![:d-d] matches the complement of the set [ : d. Musl returns an error value, indicating that the expression was ill-formed.
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html#tag_09_03_05 seems unclear on this.
It does say that [ only introduces a bracket expression if the following characters form a valid bracket expression, so patterns which contain [ and then never contain an un-escaped ] are treated as a sequence of plain characters. For character classes, equivalence classes and collating sequences, it just says that they shall be followed by a valid expression and matching terminating sequence. I think that means that the behavior of fnmatch with these expressions is undefined, so we should remove these tests.
| imply COMPILER_FREESTANDING | ||
| select COMMON_LIBC_ABORT | ||
| select COMMON_LIBC_STRNLEN | ||
| select COMMON_LIBC_FNMATCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| select COMMON_LIBC_FNMATCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the changes to this file.
| #include <string.h> | ||
|
|
||
| #include <zephyr/posix/fnmatch.h> | ||
| #include <fnmatch.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really only be done after #97855 is merged (although, I guess it is part of that PR already).
| All non-standard functions and macros listed above must be implemented by the | ||
| :ref:`common libc <c_library_common>` in order to make sure that these | ||
| functions can be made available when using a C standard library that does not | ||
| implement these functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh jeez - I don't know if I've ever seen this. It doesn't make sense to transplant POSIX functions (that are legitimately POSIX functions and not C functions) into the libc/common just for the sake of not using standards or taking shortcuts.
This is a trend we really need to get away from in the Zephyr Project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the paragraph directly below it is semi-contradictory.
Let's just use standards!
Interesting. I used glibc as my oracle, along with a careful reading of the POSIX spec and tests. I then updated picolibc to support character classes, equivalence classes and collating sequences after which picolibc and glibc agreed on all of the Zephyr tests, as well as the existing picolibc CI tests. You can see those tests here: https://github.com/picolibc/picolibc/blob/main/test/libc-testsuite/fnmatch.c That test can be compiled standalone with the native compiler,
I'll give the musl version a try and see what I discover
Thanks for your attention; as I said, I don't think I need to be involved in the structural issues at play here, my concern is strictly on POSIX conformance of the |
|
@jukkar @keith-packard - I think I would really like to see this PR being transitioned more to
The picolibc version of fnmatch should Just Work at that point as a drop in replacement, which is (I think?) what everyone wants, right? The issues with Does that make sense to everyone? |
For that we'd need users of Because it is unlikely embedded C libraries will provide strfmon() or getentropy(), and it is I would certainly like the fnmatch test being fixed, and the C library (pico or others) fnmatch being used by default when it exists. |
A regression in 936d027 introduced a subtle bug in the way that escaped expressions were handled. The regression originated with the assumption that test data (originally adapted from a 3rd-party testsuite) was correct when it was in fact flawed. Specifically, `fnmatch("[[?*\\]", "\\", 0)` should fail, since the "\\" sequence (a single backslash after compilation) escapes the following ']' character, thus leaving the bracket expression incomplete. Since the bracket expression (and therefore the whole pattern) is erroneous, the call should return `FNM_NOMATCH` to indicate failure rather than 0 to indicate success. Added new test cases from zephyrproject-rtos#98827 and some commentary for subsequent reviewers. This change does not complete zephyrproject-rtos#55186 but is related to it. Signed-off-by: Chris Friedt <cfriedt@tenstorrent.com> Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no> Signed-off-by: Keith Packard <keithp@keithp.com>
A regression in 936d027 introduced a subtle bug in the way that escaped expressions were handled. The regression originated with the assumption that test data (originally adapted from a 3rd-party testsuite) was correct when it was in fact flawed. Specifically, `fnmatch("[[?*\\]", "\\", 0)` should fail (`FNM_NOMATCH`), since the "\\" sequence (a single backslash after compilation) escapes the following ']' character, thus leaving the bracket expression incomplete. As @keith-packard has pointed out, https://pubs.opengroup.org/onlinepubs/9699919799/utilities/\ V3_chap02.html#tag_18_13_01 says that a bracket expression is only interpreted as a bracket expression, when a proper bracket expression is formed. Therefore, the pattern is interpreted as the sequence `'['`, `'['`, `'?'`, `*` (wildcard), `']'` and the call should return `FNM_NOMATCH` to indicate failure rather than 0 to indicate success. Added new test cases from zephyrproject-rtos#98827 and some commentary for subsequent reviewers. This change does not completely fix zephyrproject-rtos#55186 but is related to it. Signed-off-by: Chris Friedt <cfriedt@tenstorrent.com> Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no> Signed-off-by: Keith Packard <keithp@keithp.com>
@aescolar - #99442 should fix compatibility issues we've seen with If any embedded C libraries do not provide |
There's no real way to provide As for |
@cfriedt I was curious about this topic since it keeps coming up and we seem to have different definitions of C standard library and POSIX. I did a bit of research and I am quite sure that @nashif is right:
So the correct statement would be what Anas said, and thus it's an extension to the C library defined by POSIX, and so one can expect C libraries in general to implement it (although not always). |
|
@carles - If you're asking in order to justify moving POSIX code out of Doing so would just be adding more technical debt to the project, which should be avoided. Indeed, we have spent many years at this point, untangling the assumed POSIX availability in a lot of areas in Zephyr, and making those dependencies explicit (via Kconfig) rather than assumed. |
I agree on this. Technically, it's possible to indirectly access a PRNG via ANSI C In Zephyr, Personally, I think lumping |
@cfriedt @keith-packard CONFIG_POSIX_C_LIB_EXT_FNMATCH CONFIG_POSIX_C_LIB_EXT would just select all these CONFIG_POSIX_C_LIB_EXT_* |
That's not something I would agree too, as it would basically be doing the opposite of grouping functionality into standard subprofiling option groups and therefore add more technical debt. Beyond that, what problem would fragmenting standard option groups be perceived to fix? |



fnmatch() is used by the shell, http server & coverage modules. Given that quite a few subsystems need it, let's add it to the C library common functions, and to the expected functions from the C library in rule A.5, instead of relaying on the corresponding POSIX subsystem option being permanently enabled.
Several C libraries already implement it. For those libc which do not, enable the common implementation found in this commit.
Relates to