Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 12 additions & 3 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,20 @@ export const index = async (input: string = process.argv[2], options: IndexOptio
];

const shouldRecheck = (status: Status, lastCheckedAt: string) => {
const shouldIndexIt = indexableStatuses.includes(status);
if (status !== Status.SubmittedAndIndexed) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems wrong, there are some statuses that should not force a recheck

or am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right, I overfit to my own personal issue.

My cache was saving "status": "Not found (404)", which is not currently part of the Status enum, so those were being skipped.

I should rather just add a new Status.NotFound to indexableStatus.

However, such things can be fixed quite quickly but because of the long cacheTimeout they'd be ignored for 2 weeks. Perhaps we can have a shorter timeout for NotFound or PageWithRedirect.

What do you think?

return true;
}
const isOld = new Date(lastCheckedAt) < new Date(Date.now() - CACHE_TIMEOUT);
return shouldIndexIt && isOld;
return isOld;
};

const urlsToProcess = pages.filter((url) => {
const result = statusPerUrl[url];
return !result || shouldRecheck(result.status, result.lastCheckedAt);
});

console.log(`👉 Found ${urlsToProcess.length} URLs that need processing out of ${pages.length} total URLs`);

await batch(
async (url) => {
let result = statusPerUrl[url];
Expand All @@ -136,7 +145,7 @@ export const index = async (input: string = process.argv[2], options: IndexOptio

pagesPerStatus[result.status] = pagesPerStatus[result.status] ? [...pagesPerStatus[result.status], url] : [url];
},
pages,
urlsToProcess,
50,
(batchIndex, batchCount) => {
console.log(`📦 Batch ${batchIndex + 1} of ${batchCount} complete`);
Expand Down