Skip to content

Conversation

@shraddhabang
Copy link
Collaborator

@shraddhabang shraddhabang commented Nov 6, 2025

Description

This PR introduces support for AGA deployer allowing Kubernetes users to manage Global Accelerator resources through Kubernetes custom resources. I have divided this PR into smaller commits to make it easier to review.

Here is the brief description on each of them

  1. Setup AGA SDK Client (a39933f)
    Adds AWS Global Accelerator SDK integration with the controller. Implements client interfaces, mock implementations, and cloud provider extensions to interact with the Global Accelerator service. Updates dependencies to include required AWS SDK packages for Global Accelerator.

  2. Add AGA CRD Status Updates Utils (c34f2eb)
    Implements status updating utilities for Global Accelerator CRDs. Provides mechanisms to reflect the state of AWS Global Accelerator resources back to Kubernetes custom resources, enabling users to track resource status, lifecycle events, and error conditions directly in Kubernetes.

  3. Add AGA Tags Reconciler (9484b80)
    Implements tagging functionality for Global Accelerator resources to maintain proper resource ownership and tracking. Provides interfaces and implementations for applying, updating, and reconciling tags on AWS Global Accelerator resources with comprehensive test coverage. Please note I have moved tracking tags to deployer instead of builders. Also I have added one more region specific tags so that multiple controller in same account in different region do not override each other

  4. Add AGA Controller Config Flags (9695137)
    I have disabled AGAController for now until its ready for GA. We will make this enabled by default when we are ready to release this for prod. We have also added exponential back-off duration flags for AGA controller which customer can configure. I wanna also know if we should make requeue time configurable as well.

  5. Add AGA Deployer (47a7bec)
    Implements the core deployment logic for Global Accelerator resources. This includes:

    • Accelerator Manager for coordinating resource lifecycle operations
    • Synthesizer for transforming Kubernetes model objects into AWS resources
    • Stack deployer for managing resource dependencies and deployment ordering
    • Error handling and lifecycle management for Global Accelerator resources
  6. update controller-gen version(8dfe522)
    Updating the controller-gen version to use v0.19.0
    To fix the failing CRD verifications in CI/CD

TODO: I will add e2e tests once we build all the resources separately as this was already getting very big PR.
TODO: I have not added IAM policies. I wanna add when we are done with the upcoming release.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 6, 2025
@shraddhabang shraddhabang force-pushed the agaaccdeployer branch 2 times, most recently from ae22611 to 8dfe522 Compare November 6, 2025 20:53
@shraddhabang shraddhabang force-pushed the agaaccdeployer branch 3 times, most recently from cb1348c to 1a1363d Compare November 10, 2025 01:51
@shraddhabang shraddhabang changed the base branch from main to AGAController November 10, 2025 05:39
return nil
}

acceleratorARN := *ga.Status.AcceleratorARN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a fallback mechanism. Suppose that we provision an accelerator but are unable to persist the ARN. If before the next reconcile run happens and the customer deletes the resource from the cluster, we will orphan the accelerator because of this line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried implementing this after your suggestion but realized that since we will do look up based on tags fro finding the accelerator, we may end up getting multiple resources if the customer configures these resource tags on any other accelerator manually (very rare but they could) and we might delete the unwanted resource due to this tags error and disrupt the traffic. Since this is edge case, I don't think its worth to delete the orphaned resource instead of cleaning up a resource which might be serving traffic. What do you think?

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shraddhabang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wweiwei-li
Copy link
Collaborator

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 11, 2025
@k8s-ci-robot k8s-ci-robot merged commit 719a4cd into kubernetes-sigs:AGAController Nov 11, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants