Skip to content

Conversation

@martonvago
Copy link
Contributor

@martonvago martonvago commented Nov 21, 2025

Description

This PR integrates cdp πŸŽ‰ .
Says many files, but most were just deleted πŸ”₯ .

Closes #1582

Needs an in-depth review.

Checklist

  • Formatted Markdown
  • Ran just run-all

Comment on lines +150 to +156
package_required_checks = _map(
PACKAGE_SPROUT_REQUIRED_FIELDS,
lambda field: cdp.RequiredCheck(
jsonpath=f"$.{field}",
message=f"'{field}' is a required property",
),
)
Copy link
Contributor Author

@martonvago martonvago Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some required fields are required already in cdp, but I thought not separating those out looked cleaner. Issues will be deduplicated anyway.

One thing to note about deduplication is that for an issue to count as a duplicate all properties must be the same, including message. This actually places a restriction on the message format, especially in RequiredChecks, because the format is determined by jsonschema. To prevent bugs, maybe we should

  • not include the message when deciding if two issues are the same (at least for type="required")
  • or autoformat the message inside RequiredCheck?

Not sure how explain will influence this.

Comment on lines +166 to +181
not_blank_resource_fields = _map(
RESOURCE_SPROUT_REQUIRED_FIELDS, lambda field: f"$.resources[*].{field}"
)
not_blank = cdp.CustomCheck(
jsonpath=(
f"$.{' | '.join(PACKAGE_SPROUT_REQUIRED_FIELDS)} "
"| $.contributors[*].title"
"| $.sources[*].title"
"| $.licenses[*].name"
"| $.licenses[*].path"
f"| {' | '.join(not_blank_resource_fields)}"
),
message="This property must not be empty.",
check=lambda value: bool(value),
type="not-blank",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also create a separate CustomCheck for each path instead of the union

type="resource-path-string",
)
resource_path_format = cdp.CustomCheck(
jsonpath="$.resources[*]",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Targeting the entire resource because in the check function below I need access to the resource name to check the path.

The downside is that the JSON path in the issue will be, e.g., $.resources[0], and not $.resources[0].path.

Comment on lines +197 to +205
no_resource_data = cdp.CustomCheck(
jsonpath="$.resources[*].data",
message=(
"Sprout doesn't use the `data` field, instead it expects data "
"in separate files that are given in the `path` field."
),
check=lambda value: value is None,
type="no-inline-data",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this, but, because there is no data attribute in ResourceProperties in Sprout, even if data is added manually, compact_dict will not include it in the dict representation. So, even if data is present in datapackage.json, it will not be present in the properties passed to cdp.check. So no_resource_data will never fail.

Should we do something about this?

Comment on lines +207 to +209
exclude_resource_path_or_data_required = cdp.Exclusion(
jsonpath="$.resources[*]", type="required"
)
Copy link
Contributor Author

@martonvago martonvago Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only matches the case when both path and data are missing. Ignoring because we don't want any feedback on data. The missing path will be flagged separately.

Comment on lines +210 to +212
exclude_resource_path_type = cdp.Exclusion(
jsonpath="$.resources[*].path", type="type"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the DP standard, path can be string or array. If path is another type, the feedback will reference both string and array. In Sprout, we only allow string paths, so I'm excluding this check. That path is a string will be checked separately.

Comment on lines +213 to +215
exclude_resource_path_pattern = cdp.Exclusion(
jsonpath="$.resources[*].path", type="pattern"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above: Sprout has its own rules for what paths should look like.

Comment on lines +216 to +218
exclude_resource_path_min_items = cdp.Exclusion(
jsonpath="$.resources[*].path", type="minItems"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If path is an array, we are already flagging it as not a string (the Sprout rule), so I'm excluding specific feedback about bad arrays.

return properties


def _check_resource_path_format(resource_properties: Any) -> bool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from get_sprout_resource_errors.py

pyproject.toml Outdated
build-backend = "hatchling.build"

[tool.uv.sources]
check-datapackage = { git = "https://github.com/seedcase-project/check-datapackage" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this is correct for the time being

Comment on lines -63 to -65
# Even when resources isn't there.
delattr(properties, "resources")
assert check_properties(properties) == properties
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to test below

# All properties checks
with raises(cdp.DataPackageError) as error1:
check_properties(properties)
assert "`data`" not in str(error1.value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserts of this style are imo pretty brittle and imprecise, but I really wanted to test each config item that we define in Sprout. It took multiple days to write them and the only way I could keep track of them working together in the expected way was by repeatedly running these tests. So that feels like a good indication that these tests are useful.

@martonvago martonvago moved this from Todo to In Review in Iteration planning Nov 21, 2025
@martonvago martonvago marked this pull request as ready for review November 21, 2025 12:05
@martonvago martonvago requested a review from a team as a code owner November 21, 2025 12:05
data_package_error: cdp.DataPackageError,
) -> None:
"""Create a `DataResourceError` from a `cdp.DataPackageError`."""
message = str(data_package_error).replace("package.resources[0]", "resource")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this can be done more intelligently at this point, but it would be nicer if I wasn't replacing blindly.

Example output:

E   seedcase_sprout.check_properties.DataResourceError: 4 issues were found in your `datapackage.json`:
E   
E   At resource.description:
E   |
E   | description: None
E   |              ^^^^
E   'description' is a required property

Should we change/remove the reference to datapackage.json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is something slightly off with instance, but we can look at that as part of another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Integrate check-datapackage into Sprout

2 participants