-
Notifications
You must be signed in to change notification settings - Fork 366
capacity: fix duplicate topology #1435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: huww98 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
de81be0 to
b80ae72
Compare
|
/cc @pohly |
New segments get produced in Suppose there are two work queue items in the queue at a time when Two solutions:
|
Yes, I think this controller is designed to only run one worker external-provisioner/pkg/capacity/topology/nodes.go Lines 204 to 210 in b84b08f
It is not configuable, and we will only start one worker goroutine now. |
| if nt.upstreamSynced() { | ||
| // Now that both informers are up-to-date, | ||
| // trigger a sync to update the list of topology segments. | ||
| nt.queue.Add("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this is a common way to trigger a sync, but it is kinda difficult to debug with logging? Can the key be something more explict like "full-sync" or "reconcile-all"?
But if it is a common pattern among our sidecars, ignore this comment.
When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks. This will result in duplicated entries in capacities map, resulting in: either - Two CSIStorageCapacity object get created for the same topology, or - The same CSIStorageCapacity object get assigned to two keys in capacities map. When one of them is updated, the other one will hold an outdated object and all subsequent update will fail with conflict.
b80ae72 to
118064d
Compare
|
Here, only once. |
|
It wasn't designed to be run only once, that's just only how it's currently being done. But as that apparently is sufficient, the fix can be pretty simple:
Wouldn't that solve the problem without all of the complicated proposed back-and-forth between event handlers and sync loop? |
Then we will lost the ability of syncing partial data from upstream controllers. Not sure if this is good. Given the current implementation, I think incremental sync is not faster than a full sync. But this may delay the first topology being passed to callbacks. And together with this, I think we need to move
We still need a |
It's normal that controllers wait for a full cache sync before starting their work. It depends a bit on the controller whether it makes sense to start earlier.
Is someone checking that? I don't remember. |
|
external-provisioner/pkg/capacity/capacity.go Line 276 in b84b08f
checked here, via So It seems fine to only sync after upstream synced, because the controller is still waiting for sync. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks.
This will result in duplicated entries in capacities map, resulting in: either
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: