Skip to content

Conversation

@AnirudhJindal
Copy link

Context

This is a small proof-of-concept for #698, implementing option 3 from our discussion:

an MD5 hash of the normalized schema + test data + valid flag.

The goal is:

  • Same test across dialects → same ID
  • IDs change automatically when the semantics of a test change
  • Fully checkable in CI by recomputing the ID and comparing

normalize.js

  • Imported and adapted the AST-builder gist from @jdesrosiers.
  • Added a handler for the dialect-specific keyword draft-2020-12/dynamicRef.
  • Stripped top-level file: $id during normalization so registerSchema() doesn’t reject test schemas.
  • Exposed normalize() to return a stable, dialect-aware, $ref-resolved schema representation.

load-remotes.js

  • Recursively loaded all remotes/ JSON files and registered them under http://localhost:1234/....
  • Added deduplication so the same remote URL is never registered twice.
  • Ensured remotes only load for matching dialects.

add-test-ids.js

  • For each test missing an id, normalized its schema.
  • Generated an ID using md5(normalizedSchema + data + valid).
  • Wrote the updated tests back to disk (in this PR: only enum.json).

check-test-ids.js

  • Re-normalized schema and recomputed IDs for each test.
  • Reported missing, mismatched, or duplicate IDs.
  • Exited with error when inconsistencies were found (CI-ready behavior).

enum.json

  • Added id fields to each test using the MD5-based formula.
  • No other changes to schema structure or test contents.

How to run

# Add IDs to enum.json
node scripts/add-test-ids.js tests/draft2020-12/enum.json
# Verify IDs within draft2020-12
node scripts/check-test-ids.js tests/draft2020-12

For now, only enum.json has IDs. The checker will still report "Missing ID"
for other files in draft2020-12

Questions for reviewers

If this direction looks good, I can:

Extend the tooling as needed based on feedback

Add a small GitHub Action that runs check-test-ids.js on the touched version

Then start updating more files, one small PR at a time.

@AnirudhJindal AnirudhJindal marked this pull request as ready for review November 25, 2025 10:38
@AnirudhJindal AnirudhJindal requested a review from a team as a code owner November 25, 2025 10:38
@jdesrosiers jdesrosiers marked this pull request as draft November 25, 2025 18:37
@jdesrosiers
Copy link
Member

This should have been opened as draft PR. I've converted it this time. Next time you open a PR that's a work in progress for the purpose of getting feedback, a draft PR is the way to go.

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the first file. I'll have to get to the rest another time, but that should help clear up some misunderstandings and hopefully, you can apply some of that to the rest as well.

One more thing I noticed is that you're using the wrong indentation for JSON files. In this repo, the convention is 4 spaces. Make sure you're matching that convention or it makes your changes very difficult to review. (2 spaces for JavaScript is fine. Just JSON needs to be 4 spaces.)

Comment on lines +35 to +38
if (!Array.isArray(tests)) {
console.log("Expected an array at top level, got:", typeof tests);
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. There's already automation to check the test file against the schema. By the time we run this script, we should be confident that the test file is in the right format.

}

for (const testCase of tests) {
if (!Array.isArray(testCase.tests)) continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More unnecessary defensive programming.

for (const testCase of tests) {
if (!Array.isArray(testCase.tests)) continue;

const dialectUri = getDialectUri(testCase.schema || {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary defensive programming.

Suggested change
const dialectUri = getDialectUri(testCase.schema || {});
const dialectUri = getDialectUri(testCase.schema);

if (!Array.isArray(testCase.tests)) continue;

const dialectUri = getDialectUri(testCase.schema || {});
const normalizedSchema = await normalize(testCase.schema || true, dialectUri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary defensive programming.

Suggested change
const normalizedSchema = await normalize(testCase.schema || true, dialectUri);
const normalizedSchema = await normalize(testCase.schema, dialectUri);

Comment on lines +7 to +20
const DIALECT_MAP = {
"https://json-schema.org/draft/2020-12/schema": "https://json-schema.org/draft/2020-12/schema",
"https://json-schema.org/draft/2019-09/schema": "https://json-schema.org/draft/2019-09/schema",
"http://json-schema.org/draft-07/schema#": "http://json-schema.org/draft-07/schema#",
"http://json-schema.org/draft-06/schema#": "http://json-schema.org/draft-06/schema#",
"http://json-schema.org/draft-04/schema#": "http://json-schema.org/draft-04/schema#"
};

function getDialectUri(schema) {
if (schema.$schema && DIALECT_MAP[schema.$schema]) {
return DIALECT_MAP[schema.$schema];
}
return "https://json-schema.org/draft/2020-12/schema";
}
Copy link
Member

@jdesrosiers jdesrosiers Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of issues here.

First, you can't use 2020-12 as the default dialect. The default needs to be the dialect associated with the directory the test file is in. For example, the convention to use $schema in every test case schema was only adopted starting in 2019-09. So, tests for draft-04/6/7 don't have $schema. That means this script will treat all of those tests as 2020-12 schemas, which would be very wrong.

This whole function should be unnecessary anyway. This is already handled internally by @hyperjump/json-schema. The result of this function is only used to pass to the registerSchema function, which should be the dialect from the directory the test file is in, not from the schema.

.digest("hex");
}

async function addIdsToFile(filePath) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should need to pass the dialect based on the directory to this function.

}
}

if (changed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need a changed flag? Can't you just check if added is greater than 0?

Comment on lines +63 to +71
// Load remotes for all dialects
const remotesPaths = ["./remotes"];
for (const dialectUri of Object.values(DIALECT_MAP)) {
for (const path of remotesPaths) {
if (fs.existsSync(path)) {
loadRemotes(dialectUri, path);
}
}
}
Copy link
Member

@jdesrosiers jdesrosiers Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not going to be able to add remotes for all dialects at the same time. You're going to have to run this script separately for each dialect. Most remotes don't have $schema and are expected to run as the dialect of test case that's being tested. So, you can't load them at the same time because you can't have two schemas with the same id and different dialects. Your edits to loadRemotes are a result of this misunderstanding. I don't think you should need to make changes to that function.

I suggest making the first argument of the script the local draft identifier, then convert that to the right dialect URI to pass to addIdsToFile. Then the second argument can be the optional filePath, where all files are processed if the none is given.

}

const filePath = process.argv[2] || "tests/draft2020-12/enum.json";
addIdsToFile(filePath).catch(console.error); No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch is unnecessary here. Just let it throw if there's an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants