Skip to content

Conversation

@huww98
Copy link
Contributor

@huww98 huww98 commented Oct 30, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks.

This will result in duplicated entries in capacities map, resulting in: either

  • Two CSIStorageCapacity object get created for the same topology, or
  • The same CSIStorageCapacity object get assigned to two keys in capacities map. When one of them is updated, the other one will hold an outdated object and all subsequent update will fail with conflict.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixed possible duplicated CSIStorageCapacity and constantly failing update request.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huww98
Once this PR has been reviewed and has the lgtm label, please assign pohly for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 30, 2025
@huww98 huww98 force-pushed the fix-duplicate-capacity branch from de81be0 to b80ae72 Compare October 30, 2025 14:34
@huww98
Copy link
Contributor Author

huww98 commented Oct 30, 2025

/cc @pohly

@k8s-ci-robot k8s-ci-robot requested a review from pohly October 30, 2025 17:28
@pohly
Copy link
Contributor

pohly commented Nov 24, 2025

When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks.

New segments get produced in sync, right? So whenever two sync calls are executed in parallel, we have this problem. I agree that this is faulty. What isn't clear to me is the proposed solution.

Suppose there are two work queue items in the queue at a time when nt.hasSynced is still false. Both get processed in parallel. Don't we still have the problem?

Two solutions:

  • only run one worker
  • serialize the code which generates new segment pointers (not exactly sure though how long the mutex must be held for that)

@huww98
Copy link
Contributor Author

huww98 commented Nov 24, 2025

only run one worker

Yes, I think this controller is designed to only run one worker

func (nt *nodeTopology) RunWorker(ctx context.Context) {
klog.Info("Started node topology worker")
defer klog.Info("Shutting node topology worker")
for nt.processNextWorkItem(ctx) {
}
}

It is not configuable, and we will only start one worker goroutine now.

if nt.upstreamSynced() {
// Now that both informers are up-to-date,
// trigger a sync to update the list of topology segments.
nt.queue.Add("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is a common way to trigger a sync, but it is kinda difficult to debug with logging? Can the key be something more explict like "full-sync" or "reconcile-all"?

But if it is a common pattern among our sidecars, ignore this comment.

When the controller starts, 2 sync() call will run simultaneously, one from HasSynced(), another from processNextWorkItem(). Each will produce an instance for the same topology segment, and pass it to callbacks.

This will result in duplicated entries in capacities map, resulting in: either
- Two CSIStorageCapacity object get created for the same topology, or
- The same CSIStorageCapacity object get assigned to two keys in capacities map. When one of them is updated, the other one will hold an outdated object and all subsequent update will fail with conflict.
@huww98 huww98 force-pushed the fix-duplicate-capacity branch from b80ae72 to 118064d Compare November 25, 2025 06:18
@pohly
Copy link
Contributor

pohly commented Nov 25, 2025

Yes, I think this controller is designed to only run one worker

RunWorker executes one worker, but could be invoked more than once. Where is it called?

@huww98
Copy link
Contributor Author

huww98 commented Nov 25, 2025

Yes, I think this controller is designed to only run one worker

RunWorker executes one worker, but could be invoked more than once. Where is it called?

go topologyInformer.RunWorker(ctx)

Here, only once.

@pohly
Copy link
Contributor

pohly commented Nov 25, 2025

It wasn't designed to be run only once, that's just only how it's currently being done. But as that apparently is sufficient, the fix can be pretty simple:

  • document that RunWorker must only be called once
  • in RunWorker, block waiting for informer sync
  • populate one work queue item
  • run the for loop

Wouldn't that solve the problem without all of the complicated proposed back-and-forth between event handlers and sync loop?

@huww98
Copy link
Contributor Author

huww98 commented Nov 25, 2025

  • in RunWorker, block waiting for informer sync

Then we will lost the ability of syncing partial data from upstream controllers. Not sure if this is good. Given the current implementation, I think incremental sync is not faster than a full sync. But this may delay the first topology being passed to callbacks.

And together with this, I think we need to move go topologyInformer.RunWorker(ctx) after we start other informers, or we will poll not-started informers infinitely if we are not the leader.

Wouldn't that solve the problem without all of the complicated proposed back-and-forth between event handlers and sync loop?

We still need a hasSynced atomic.Bool to tell outside that we have finished at least one loop.
IMO, your proposal can be simpler, but not very much. I can give it a try.

@pohly
Copy link
Contributor

pohly commented Nov 25, 2025

Then we will lost the ability of syncing partial data from upstream controllers. Not sure if this is good.

It's normal that controllers wait for a full cache sync before starting their work. It depends a bit on the controller whether it makes sense to start earlier.

We still need a hasSynced atomic.Bool to tell outside that we have finished at least one loop.

Is someone checking that? I don't remember.

@huww98
Copy link
Contributor Author

huww98 commented Nov 26, 2025

if !cache.WaitForCacheSync(ctx.Done(), c.topologyInformer.HasSynced, c.scInformer.Informer().HasSynced, c.cInformer.Informer().HasSynced) {

checked here, via c.topologyInformer.HasSynced.

So It seems fine to only sync after upstream synced, because the controller is still waiting for sync.

@huww98
Copy link
Contributor Author

huww98 commented Nov 26, 2025

@pohly Please take a look at #1450. Implemented your proposed fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants