-
Notifications
You must be signed in to change notification settings - Fork 521
fix: Fix crawler_runtime not being updated during run and only in the end
#1540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| async def __aenter__(self) -> Self: | ||
| self._active = True | ||
| await self._state.initialize() | ||
| self._after_initialize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what this was for these dummy statistics. Could you please double-check @janbuchar ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was probably so that record_* methods wouldn't randomly fail during context pipeline execution because of incorrectly initialized state. I assume it's not necessary anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen any failure in tests, and those methods execute. But the same can be seen on master when deleting this line, so I guess it was made redundant by some other change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably 🤞
janbuchar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit, just one comment readability issue
| async def __aenter__(self) -> Self: | ||
| self._active = True | ||
| await self._state.initialize() | ||
| self._after_initialize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was probably so that record_* methods wouldn't randomly fail during context pipeline execution because of incorrectly initialized state. I assume it's not necessary anymore?
| # Flag to indicate the context state. | ||
| self._active = False | ||
|
|
||
| # Pre-existing runtime offset when importing existing statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does importing existing statistics mean here? Like restoring serialized state from KVS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated comment.
vdusek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
BasicCrawler.statistics.state.crawler_runtimeto be properly updated on eachBasicCrawler.statistics.calculate()call when the statistics are still active.BasicCrawler.statistics.state.crawler_runtimeonBasicCrawler.statistics.calculate()when the statistics are already deactivated - to avoid confusing differences between logged and persisted state.Issues
Closes: #1541
Testing
Checklist