-
Notifications
You must be signed in to change notification settings - Fork 1
feat: β¨ integrate check-datapackage
#1605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| package_required_checks = _map( | ||
| PACKAGE_SPROUT_REQUIRED_FIELDS, | ||
| lambda field: cdp.RequiredCheck( | ||
| jsonpath=f"$.{field}", | ||
| message=f"'{field}' is a required property", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some required fields are required already in cdp, but I thought not separating those out looked cleaner. Issues will be deduplicated anyway.
One thing to note about deduplication is that for an issue to count as a duplicate all properties must be the same, including message. This actually places a restriction on the message format, especially in RequiredChecks, because the format is determined by jsonschema. To prevent bugs, maybe we should
- not include the
messagewhen deciding if two issues are the same (at least fortype="required") - or autoformat the message inside
RequiredCheck?
Not sure how explain will influence this.
| not_blank_resource_fields = _map( | ||
| RESOURCE_SPROUT_REQUIRED_FIELDS, lambda field: f"$.resources[*].{field}" | ||
| ) | ||
| not_blank = cdp.CustomCheck( | ||
| jsonpath=( | ||
| f"$.{' | '.join(PACKAGE_SPROUT_REQUIRED_FIELDS)} " | ||
| "| $.contributors[*].title" | ||
| "| $.sources[*].title" | ||
| "| $.licenses[*].name" | ||
| "| $.licenses[*].path" | ||
| f"| {' | '.join(not_blank_resource_fields)}" | ||
| ), | ||
| message="This property must not be empty.", | ||
| check=lambda value: bool(value), | ||
| type="not-blank", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also create a separate CustomCheck for each path instead of the union
| type="resource-path-string", | ||
| ) | ||
| resource_path_format = cdp.CustomCheck( | ||
| jsonpath="$.resources[*]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Targeting the entire resource because in the check function below I need access to the resource name to check the path.
The downside is that the JSON path in the issue will be, e.g., $.resources[0], and not $.resources[0].path.
| no_resource_data = cdp.CustomCheck( | ||
| jsonpath="$.resources[*].data", | ||
| message=( | ||
| "Sprout doesn't use the `data` field, instead it expects data " | ||
| "in separate files that are given in the `path` field." | ||
| ), | ||
| check=lambda value: value is None, | ||
| type="no-inline-data", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this, but, because there is no data attribute in ResourceProperties in Sprout, even if data is added manually, compact_dict will not include it in the dict representation. So, even if data is present in datapackage.json, it will not be present in the properties passed to cdp.check. So no_resource_data will never fail.
Should we do something about this?
| exclude_resource_path_or_data_required = cdp.Exclusion( | ||
| jsonpath="$.resources[*]", type="required" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only matches the case when both path and data are missing. Ignoring because we don't want any feedback on data. The missing path will be flagged separately.
| exclude_resource_path_type = cdp.Exclusion( | ||
| jsonpath="$.resources[*].path", type="type" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the DP standard, path can be string or array. If path is another type, the feedback will reference both string and array. In Sprout, we only allow string paths, so I'm excluding this check. That path is a string will be checked separately.
| exclude_resource_path_pattern = cdp.Exclusion( | ||
| jsonpath="$.resources[*].path", type="pattern" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reasoning as above: Sprout has its own rules for what paths should look like.
| exclude_resource_path_min_items = cdp.Exclusion( | ||
| jsonpath="$.resources[*].path", type="minItems" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If path is an array, we are already flagging it as not a string (the Sprout rule), so I'm excluding specific feedback about bad arrays.
| return properties | ||
|
|
||
|
|
||
| def _check_resource_path_format(resource_properties: Any) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from get_sprout_resource_errors.py
pyproject.toml
Outdated
| build-backend = "hatchling.build" | ||
|
|
||
| [tool.uv.sources] | ||
| check-datapackage = { git = "https://github.com/seedcase-project/check-datapackage" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope this is correct for the time being
| # Even when resources isn't there. | ||
| delattr(properties, "resources") | ||
| assert check_properties(properties) == properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to test below
| # All properties checks | ||
| with raises(cdp.DataPackageError) as error1: | ||
| check_properties(properties) | ||
| assert "`data`" not in str(error1.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserts of this style are imo pretty brittle and imprecise, but I really wanted to test each config item that we define in Sprout. It took multiple days to write them and the only way I could keep track of them working together in the expected way was by repeatedly running these tests. So that feels like a good indication that these tests are useful.
| data_package_error: cdp.DataPackageError, | ||
| ) -> None: | ||
| """Create a `DataResourceError` from a `cdp.DataPackageError`.""" | ||
| message = str(data_package_error).replace("package.resources[0]", "resource") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this can be done more intelligently at this point, but it would be nicer if I wasn't replacing blindly.
Example output:
E seedcase_sprout.check_properties.DataResourceError: 4 issues were found in your `datapackage.json`:
E
E At resource.description:
E |
E | description: None
E | ^^^^
E 'description' is a required property
Should we change/remove the reference to datapackage.json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there is something slightly off with instance, but we can look at that as part of another PR.
Description
This PR integrates
cdpπ .Says many files, but most were just deleted π₯ .
Closes #1582
Needs an in-depth review.
Checklist
just run-all