Skip to content

Conversation

@ziadhany
Copy link
Collaborator

@ziadhany ziadhany commented Nov 3, 2025

  • Introduce affected_by_commits and fixed_by_commits fields in our advisory
  • Update from_dict and to_dict methods
  • Update compute_checksum method
  • Create a CodeCommitData importer class
  • Update OSV to collect code fix commits

… in Advisory

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany ziadhany force-pushed the advisory-fix-commit-1 branch from 2af10cf to a8ec9f1 Compare November 4, 2025 15:58
@ziadhany ziadhany marked this pull request as ready for review November 4, 2025 16:01
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany ziadhany changed the title Add support for affected_by_commits and fixed_by_commits Add support for affected_by_commits, fixed_by_commits, and OSV code fix commits Nov 5, 2025
Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany ziadhany requested review from TG1999 and keshav-space and removed request for keshav-space November 5, 2025 15:40
@TG1999
Copy link
Contributor

TG1999 commented Nov 6, 2025

@ziadhany add description in the PR please!

Add all the fields in keys for comparison CodeCommitData

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany ziadhany requested a review from TG1999 November 7, 2025 02:48
@TG1999
Copy link
Contributor

TG1999 commented Nov 7, 2025

@ziadhany mostly looks good! Please run the importer once and paste the logs here. Thanks!

I want to see if we are missing on any data in OSV format. And how does the AdvisoryData and ImpactedPackages looks with the new CommitData. Thanks!

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany
Copy link
Collaborator Author

ziadhany commented Nov 7, 2025

@TG1999 This is the log output for the following importers:

  • pysec_importer_v2
  • pypa_importer_v2
  • oss_fuzz_importer_v2

importers_logs.zip

the database query result :
vulnerabilities_advisoryv2 Total rows: 10274
vulnerabilities_impactedpackage_fixed_by_commits Total rows: 4013
vulnerabilities_impactedpackage_affecting_commits Total rows: 3623
vulnerabilities_codecommit Total rows: 3791

@ziadhany ziadhany requested a review from TG1999 November 7, 2025 14:56
@TG1999
Copy link
Contributor

TG1999 commented Nov 10, 2025

@ziadhany

Invalid VersionRange  for affected_pkg: {'package': {'name': 'apache-commons-io', 'ecosystem': 'OSS-Fuzz', 'purl': 'pkg:generic/apache-commons-io'}, 'ranges': [{'type': 'GIT', 'repo': 'https://github.com/apache/commons-io.git', 'events': [{'introduced': '72b1f88fb722def136ce87c9b2bfdd3c9126bb3d'}, {'fixed': 'd3e5bd6de8bc96abbadccea8b934dc038a32e90c'}]}], 'versions': ['commons-io-2.14.0-RC1', 'rel/commons-io-2.14.0'], 'ecosystem_specific': {'severity': 'LOW'}, 'database_specific': {'introduced_range': 'c511d15294d1a406a177368804014313948e2601:06fde31494c279ad940149e1a3d4944040c73c0d', 'fixed_range': '247c8e7d85a8df293011c7e9c94fd50bb2986fb7:d3e5bd6de8bc96abbadccea8b934dc038a32e90c'}} for OSV id: 'OSV-2023-962': error:InvalidVersion("'commons-io-2.14.0-RC1' is not a valid <class 'univers.versions.SemverVersion'>")
Invalid VersionRange  for affected_pkg: {'package': {'name': 'apache-commons-io', 'ecosystem': 'OSS-Fuzz', 'purl': 'pkg:generic/apache-commons-io'}, 'ranges': [{'type': 'GIT', 'repo': 'https://github.com/apache/commons-io.git', 'events': [{'introduced': '72b1f88fb722def136ce87c9b2bfdd3c9126bb3d'}, {'fixed': 'd3e5bd6de8bc96abbadccea8b934dc038a32e90c'}]}], 'versions': ['commons-io-2.14.0-RC1', 'rel/commons-io-2.14.0'], 'ecosystem_specific': {'severity': 'LOW'}, 'database_specific': {'introduced_range': 'c511d15294d1a406a177368804014313948e2601:06fde31494c279ad940149e1a3d4944040c73c0d', 'fixed_range': '247c8e7d85a8df293011c7e9c94fd50bb2986fb7:d3e5bd6de8bc96abbadccea8b934dc038a32e90c'}} for OSV id: 'OSV-2023-618': error:InvalidVersion("'commons-io-2.14.0-RC1' is not a valid <class 'univers.versions.SemverVersion'>")

Why are we getting in this logs? The commit data should have been created for this

@TG1999
Copy link
Contributor

TG1999 commented Nov 10, 2025

See all Invalid VersionRange errors. Why these are coming?

{'package': {'name': 'apache-commons-codec', 'ecosystem': 'OSS-Fuzz', 'purl': 'pkg:generic/apache-commons-codec'}, 'ranges': [{'type': 'GIT', 'repo': 'https://gitbox.apache.org/repos/asf/commons-codec.git', 'events': [{'introduced': '44e4c4d778c3ab87db09c00e9d1c3260fd42dad5'}, {'fixed': '3bf874e2141dc08550c0b330c7a7006f358bb0f0'}]}], 'versions': ['commons-codec-1.16.1-RC1', 'rel/commons-codec-1.16.1'], 'ecosystem_specific': {'severity': 'LOW'}, 'database_specific': {'fixed_range': '72c40fe6f62410bcaa019dbf2cb570ee4e49b70e:3bf874e2141dc08550c0b330c7a7006f358bb0f0'}} for OSV id: 'OSV-2023-1195': error:InvalidVersion("'commons-codec-1.16.1-RC1' is not a valid <class 'univers.versions.SemverVersion'>")

when we have introduced and fixed events to create code commit data.

…ported

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@ziadhany
Copy link
Collaborator Author

ziadhany commented Nov 11, 2025

I updated the script to handle unsupported packages (especially for OSS-Fuzz). CodeCommit is no longer ignored even if the package is unsupported, and logs are now more meaningful.

This is the updated logs:
importers_v2.zip

the database query result :
vulnerabilities_advisoryv2 Total rows: 17041
vulnerabilities_impactedpackage_fixed_by_commits Total rows: 7343
vulnerabilities_impactedpackage_affecting_commits Total rows: 6553
vulnerabilities_codecommit Total rows: 6553

Issues related:

  • pysec_importer_v2 / pypa_importer_v2:
  • oss_fuzz_importer_v2
    • Unsupported package type: None in OSV: 'OSV-2021-1227' This means the package type is unknown (e.g., generic, etc.), and there is no PURL associated with it.
    • Invalid VersionRange for affected_pkg It depends on whether this is a valid version, for example, a semver version or not.
      example:
      > SemverVersion('commons-io-2.14.0-RC1')
      > univers.versions.InvalidVersion: 'commons-io-2.14.0-RC1' is not a valid <class 'univers.versions.SemverVersion'>

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@TG1999
Copy link
Contributor

TG1999 commented Nov 11, 2025

ERROR 2025-11-11 13:34:49.213781 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.json

This log does not tell me a lot, what's the data. Why this is unsupported.

@TG1999
Copy link
Contributor

TG1999 commented Nov 11, 2025

Invalid VersionRange for affected_pkg: ['0.8', '0.9', '0.9.3', '0.9.4', '0.9.5', '0.9.6', '0.9.7', '0.9.8', '0.9.9', '2.0.1', '2.0.1rc1', '2.0.1rc2-git', '2.0.1rc3', '2.0.1rc4', '2.0.2', '2.0.3', '2.0.4', '2.0.5', '2.0b4', '2.0b5', '2.0b6', '2.0b7', '2.0b8', '2.0b9', '3.0.0', '3.0.0b1', '3.0.0b2', '3.0.1', '3.0.2', '3.0.3', '3.0.4', '3.0.5', '3.1', '3.2', '3.2.1', '3.2.2', '3.2.3', '3.2.4', '3.2.5', '3.3', '3.4', '3.4.1', '3.4.2', '3.4.3', '3.4.4', '3.4.5', '3.5', '3.5b1', '3.6', '3.6.1', '3.6.2', '3.6.3', '3.6.4'] for OSV id: 'PYSEC-2021-859': error:InvalidVersion("'2.0.1rc2-git' is not a valid <class 'univers.versions.PypiVersion'>")

One of the list might not be a valid version, but all others are valid, are we ingesting them or skipping whole list if we can't ingest one.

@ziadhany
Copy link
Collaborator Author

ERROR 2025-11-11 13:34:49.213781 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.json

This log does not tell me a lot, what's the data. Why this is unsupported.

@TG1999 We are ignoring GHSA files since we target only PYSEC files.
https://github.com/aboutcode-org/vulnerablecode/blob/main/vulnerabilities/pipelines/v2_importers/pysec_importer.py#L54

@TG1999
Copy link
Contributor

TG1999 commented Nov 11, 2025

ERROR 2025-11-11 13:34:49.213781 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.json

This log does not tell me a lot, what's the data. Why this is unsupported.

@TG1999 We are ignoring GHSA files since we target only PYSEC files.

https://github.com/aboutcode-org/vulnerablecode/blob/main/vulnerabilities/pipelines/v2_importers/pysec_importer.py#L54

Then add that to the log as well :)

@ziadhany
Copy link
Collaborator Author

ziadhany commented Nov 11, 2025

Invalid VersionRange for affected_pkg: ['0.8', '0.9', '0.9.3', '0.9.4', '0.9.5', '0.9.6', '0.9.7', '0.9.8', '0.9.9', '2.0.1', '2.0.1rc1', '2.0.1rc2-git', '2.0.1rc3', '2.0.1rc4', '2.0.2', '2.0.3', '2.0.4', '2.0.5', '2.0b4', '2.0b5', '2.0b6', '2.0b7', '2.0b8', '2.0b9', '3.0.0', '3.0.0b1', '3.0.0b2', '3.0.1', '3.0.2', '3.0.3', '3.0.4', '3.0.5', '3.1', '3.2', '3.2.1', '3.2.2', '3.2.3', '3.2.4', '3.2.5', '3.3', '3.4', '3.4.1', '3.4.2', '3.4.3', '3.4.4', '3.4.5', '3.5', '3.5b1', '3.6', '3.6.1', '3.6.2', '3.6.3', '3.6.4'] for OSV id: 'PYSEC-2021-859': error:InvalidVersion("'2.0.1rc2-git' is not a valid <class 'univers.versions.PypiVersion'>")

One of the list might not be a valid version, but all others are valid, are we ingesting them or skipping whole list if we can't ingest one.

We are skipping this since the version range would likely be inconsistent if we processed it.
I also created a related issue in univers:

I can changes this if needed.

Signed-off-by: ziad hany <ziadhany2016@gmail.com>
@TG1999
Copy link
Contributor

TG1999 commented Nov 11, 2025

Invalid VersionRange for affected_pkg: ['0.8', '0.9', '0.9.3', '0.9.4', '0.9.5', '0.9.6', '0.9.7', '0.9.8', '0.9.9', '2.0.1', '2.0.1rc1', '2.0.1rc2-git', '2.0.1rc3', '2.0.1rc4', '2.0.2', '2.0.3', '2.0.4', '2.0.5', '2.0b4', '2.0b5', '2.0b6', '2.0b7', '2.0b8', '2.0b9', '3.0.0', '3.0.0b1', '3.0.0b2', '3.0.1', '3.0.2', '3.0.3', '3.0.4', '3.0.5', '3.1', '3.2', '3.2.1', '3.2.2', '3.2.3', '3.2.4', '3.2.5', '3.3', '3.4', '3.4.1', '3.4.2', '3.4.3', '3.4.4', '3.4.5', '3.5', '3.5b1', '3.6', '3.6.1', '3.6.2', '3.6.3', '3.6.4'] for OSV id: 'PYSEC-2021-859': error:InvalidVersion("'2.0.1rc2-git' is not a valid <class 'univers.versions.PypiVersion'>")

One of the list might not be a valid version, but all others are valid, are we ingesting them or skipping whole list if we can't ingest one.

We are skipping this since the version range would likely be inconsistent if we processed it.

I also created a related issue in univers:

I can changes this if needed.

@keshav-space @pombredanne thoughts on this one ?

@TG1999
Copy link
Contributor

TG1999 commented Nov 11, 2025

For PYSEC data we would be using github version range, coz the versions are Semver. And if a version is not parsable that version should be skipped. Not the entire range. Also we should introduce a flag for advisories that were not completely parsed. So in future if our parsing techniques gets better we can delete the incomplete parsed advisory with a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants