Skip to content

Conversation

@cyphar
Copy link

@cyphar cyphar commented Aug 18, 2025

The top-level mediaType member was added in response to CVE-2021-41190,
but while it is suggested (SHOULD) it is not required (MUST) and some
older tools do not fill this mediaType field (such as skopeo, at least
for "index.json").

I plan to use these jq-based validation scripts for umoci, but
incompatibility with skopeo is a little annoying (since that is what we
use to pull images for our tests). We can work around it for
"index.json", but it seems incorrect to claim that an image is invalid
because of a missing suggested field.

Instead, add an informational message but still permit such images.

Signed-off-by: Aleksa Sarai cyphar@cyphar.com

@cyphar cyphar requested a review from a team as a code owner August 18, 2025 17:41
@cyphar cyphar force-pushed the mediatype-optional branch from d16b138 to febc238 Compare August 18, 2025 17:47
The top-level mediaType member was added in response to CVE-2021-41190,
but while it is suggested (SHOULD) it is not required (MUST) and some
older tools do not fill this mediaType field (such as skopeo, at least
for "index.json").

I plan to use these jq-based validation scripts for umoci, but
incompatibility with skopeo is a little annoying (since that is what we
use to pull images for our tests). We can work around it for
"index.json", but it seems incorrect to claim that an image is invalid
because of a missing suggested field.

Instead, add an informational message but still permit such images.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@cyphar cyphar force-pushed the mediatype-optional branch from febc238 to 4d7e69c Compare August 18, 2025 18:00
@tianon
Copy link
Member

tianon commented Aug 18, 2025

As I've noted over in #127 (comment), the point of this validation is not general purpose "is this valid according to the OCI" but more specifically "is this both valid according to the OCI and matches our more specific and opinionated expectations", which is why I made the very explicit decision to require mediaType at the top level of these objects (after deliberating on it for a very long time, and even going back and forth several times).

To put that more succinctly, every tool that we use to generate these objects already includes mediaType at the top-level, has been patched to do so, or will be patched to do so.

Given that these are all jq-based validation functions, my recommendation would be to include something like .mediaType //= media_type_oci_index or .mediaType //= media_type_oci_image as appropriate before invoking the validation function, if you intend for mediaType at the top-level of the object to be optional.

Even better would be to preserve the explicit mediaType from an OCI descriptor and use that value -- see also:

| if $desc then
validate_IN(.mediaType; $desc.mediaType)
| validate_IN(.artifactType; $desc.artifactType)
else . end

| if $desc then
validate_IN(.mediaType; $desc.mediaType)
| validate_IN(.artifactType; $desc.artifactType)
else . end

(which could very reasonably be .mediaType //= or be prefaced with, in an alternative implementation -- I'm not super happy with this shell script, but it does the job for now)

@cyphar
Copy link
Author

cyphar commented Aug 18, 2025

That's understandable but still a bit unfortunate.

I was hoping to be able to use this directly as a stop-gap for umoci validate (oci-image-tools validate broke very recently due to some unrelated repo changes in the image-spec repo) and these seemed like a good resource (as we kind of discussed in opencontainers/image-spec#637 recently). Setting mediaType manually was the workaround I initially used but it seemed preferable to upstream a fix for the scripts themselves.

I'm happy to carry these in a fork I guess (since it is ultimately just a stop-gap) but it would be nice to have a non-Go validation tool just in case (and I was hoping to be able to use both umoci validate and the oci.jq scripts here). Unfortunately, I wanted to use helpers/oci-validate.sh to avoid writing my own wrappers, which means that I would have to patch the scripts regardless.

@tianon
Copy link
Member

tianon commented Aug 18, 2025

If I could do all the OCI layout traversal in pure jq (thus deprecating most of the shell script), I'd probably be a lot more open to maintaining a more general-purpose "strict OCI" variant of this code (and then only maintain the variance/opinion here), but as-is it's hard to be motivated to write such a thing given how irritating the traversal is.

I've toyed with the idea of doing something with gojq (where I could inject my own helper functions), but the lack of order-preserving maps is a complete non-starter for me.

A few relevant jq links for ways this could be improved:

@polarathene
Copy link

polarathene commented Nov 21, 2025

I've toyed with the idea of doing something with gojq (where I could inject my own helper functions), but the lack of order-preserving maps is a complete non-starter for me.

I'm not 100% sure if jaq doesn't have that problem (this docs entry suggests it doesn't), but the other concern with base64 that you link to on the cryptographic hash issue comment isn't a problem with jaq (but binary data does seem reliant upon an upcoming 3.0 release, so you'd need to wait on that).

jaq has a library available in addition to the CLI, so if you're comfortable with Rust it might be an option that works for where gojq wasn't.

jaq also has import so that you can load in another JSON file, but after trying that I've realized it wouldn't work in this situation.

The assert issue linked can be adapted to try/error/catch with jaq? (EDIT: Nevermind this is the same syntax with jq already, I misunderstood the PR):

$ jaq -n '[{a: 4, b: 2}, {a: 7, b: 6}] | try error(map(.b) | add > 10) catch("your message here")'
"your message here"

nushell is an entire shell, but it has comparison pages to jq. Using it's own DSL to do the same operations technically matches your requirements to not call out to individual CLI tools. It does have the operations you're interested in:

You can use various operators and pipes with nushell, along with define custom methods, etc. It feels a bit awkward as a shell sometimes, but fairly capable of processing data. It does come with the convenience of a container or static binary but it's also a bit heavy in size for a utility compared to jq.

There's also yq which is fairly similar to jq but does have a load operator you need, and it has a Go library package.

fq (also Go) is a jq alternative with a focus on binary support. It has a tosha256 operator (EDIT: I see it was already mentioned at the jq issue), along with open to read in a file.

@tianon
Copy link
Member

tianon commented Nov 21, 2025

Thank you for the really detailed comment! 💖

Indeed, jaq is the solution I've had open in tabs in my browser for many months now hoping I'd find enough time/motivation to try implementing something I could use for this, and with the 3.0.0 alpha release it's looking even nicer than before, and why I was looking at jaq again yesterday/today (so your timing is hilarious - I hadn't seen this comment yet before chatting with you over in 01mf02/jaq#282 😂). Alas, time for such a thing is still scarse.

I've had a lot of prior bad times with yq in particular (I usually have it just parse my YAML or XML and output JSON so I can then shell out to jq instead because the behavior breaks in subtle ways otherwise), but I was really excited about fq - however, it does indeed suffer from the same map ordering issues as gojq, so it's unfortunately not the answer.

To your point, I tested jaq and it is indeed order-preserving, so there's probably hope for this project yet ("just" some Rust glue code to write). ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants