Skip to content

Conversation

@cazlo
Copy link

@cazlo cazlo commented Sep 1, 2025

What does this PR do?

Support Pod Identity for EKS add-ons. This is an alternative to IRSA available for many EKS add-ons such as VPC CNI, CloudWatch Observability, EBS-CSI, etc.

Motivation

More

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Yes, I ran pre-commit run -a with this PR

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

Deploying the minimal test case into my account was successful for most services and add-ons, including VPC-CNI.
During the test there were 2 issues seen which do not appear related to the changes in this PR:

  1. Order of operations issue between AWS PrivateCA issuer and cert-manager. This is expected to resolve on second terraform apply, but this was not confirmed for the test.
  2. Issue pulling ingress-nginx chart from GitHub. This is due to flaky internet access where the test was done and is also expected to auto-resolve on second apply.
Click to see screenshot of end of terraform apply log Screenshot_20250831_183811

For prod operations, I usually split out each add-on into their own terraform state, or their own blueprint module instantiation, so that we can order the deploy appropriately using depends_on or other orchestration such as terragrunt dependencies. Doing such a refactor did not seem appropriate for the scope of this PR.

Despite the apply issue, the cluster was generally in good health, based on inspection of changed service logs and looking at the pod overview. We see also evidence in logs the eks pod identity is used.

Click to see pods overview
─[$] k get po -A                                                                                                                                [17:21:11]
NAMESPACE                           NAME                                                              READY   STATUS    RESTARTS   AGE
amazon-cloudwatch                   aws-cloudwatch-metrics-crfmj                                      1/1     Running   0          9m28s
amazon-cloudwatch                   aws-cloudwatch-metrics-dn6jh                                      1/1     Running   0          9m30s
amazon-cloudwatch                   aws-cloudwatch-metrics-h4v9c                                      1/1     Running   0          9m28s
amazon-cloudwatch                   aws-cloudwatch-metrics-kmfrf                                      1/1     Running   0          9m29s
amazon-cloudwatch                   aws-cloudwatch-metrics-lpsjt                                      0/1     Pending   0          11m
amazon-cloudwatch                   aws-cloudwatch-metrics-vg7x9                                      1/1     Running   0          9m29s
argo-rollouts                       argo-rollouts-796798689f-bxmkd                                    1/1     Running   0          11m
argo-rollouts                       argo-rollouts-796798689f-pxmgl                                    1/1     Running   0          11m
argo-workflows                      argo-workflows-server-cf659b454-2l6g5                             1/1     Running   0          12m
argo-workflows                      argo-workflows-workflow-controller-f8765b545-4wgnk                1/1     Running   0          12m
argocd                              argo-cd-argocd-application-controller-0                           1/1     Running   0          12m
argocd                              argo-cd-argocd-applicationset-controller-56789fbb9c-tt4pq         1/1     Running   0          11m
argocd                              argo-cd-argocd-dex-server-655777b6f8-mxld5                        1/1     Running   0          12m
argocd                              argo-cd-argocd-notifications-controller-5f9f85c4d5-xg7g2          1/1     Running   0          12m
argocd                              argo-cd-argocd-redis-568b7d7bf5-w7cjt                             1/1     Running   0          12m
argocd                              argo-cd-argocd-repo-server-fb799bb4c-bwwnx                        1/1     Running   0          11m
argocd                              argo-cd-argocd-server-784d76bf9c-5z7m4                            1/1     Running   0          12m
aws-application-networking-system   aws-gateway-api-controller-aws-gateway-controller-chart-6b4x9mf   1/1     Running   0          11m
aws-application-networking-system   aws-gateway-api-controller-aws-gateway-controller-chart-6b7t4zn   1/1     Running   0          11m
aws-node-termination-handler        aws-node-termination-handler-7c76b9c876-dcvvf                     1/1     Running   0          12m
brupop-bottlerocket-aws             brupop-apiserver-6f9dc8c86c-5g4zh                                 1/1     Running   0          8m25s
brupop-bottlerocket-aws             brupop-apiserver-6f9dc8c86c-8rnqr                                 1/1     Running   0          8m25s
brupop-bottlerocket-aws             brupop-apiserver-6f9dc8c86c-kt9px                                 1/1     Running   0          8m25s
brupop-bottlerocket-aws             brupop-controller-deployment-6d5d9c9b6-qnpxr                      1/1     Running   0          8m25s
cert-manager                        cert-manager-7ddb454967-lclqq                                     1/1     Running   0          11m
cert-manager                        cert-manager-cainjector-766dd9ccf7-zrdqv                          1/1     Running   0          11m
cert-manager                        cert-manager-webhook-9c98cff95-xbjbl                              1/1     Running   0          11m
external-dns                        external-dns-7465775dd-5cphj                                      1/1     Running   0          10m
external-secrets                    external-secrets-6f9dfbc64-zd6xh                                  1/1     Running   0          11m
external-secrets                    external-secrets-cert-controller-5856b74658-87xq7                 1/1     Running   0          10m
external-secrets                    external-secrets-webhook-6bd88c4d75-vqmbc                         1/1     Running   0          8m22s
gatekeeper-system                   gatekeeper-audit-5db6897bbc-lt8dz                                 1/1     Running   0          10m
gatekeeper-system                   gatekeeper-controller-manager-68579f7f77-dfqpq                    1/1     Running   0          10m
gatekeeper-system                   gatekeeper-controller-manager-68579f7f77-k8z6g                    1/1     Running   0          10m
gatekeeper-system                   gatekeeper-controller-manager-68579f7f77-lv5sk                    1/1     Running   0          10m
gpu-operator                        gpu-operator-86765669fc-2wq8b                                     1/1     Running   0          4m34s
gpu-operator                        gpu-operator-node-feature-discovery-gc-555ccf7687-9bwhv           1/1     Running   0          4m34s
gpu-operator                        gpu-operator-node-feature-discovery-master-68d694564d-c2fvm       1/1     Running   0          4m34s
gpu-operator                        gpu-operator-node-feature-discovery-worker-69klz                  1/1     Running   0          3m34s
gpu-operator                        gpu-operator-node-feature-discovery-worker-bx45b                  0/1     Pending   0          3m34s
gpu-operator                        gpu-operator-node-feature-discovery-worker-n9qcq                  1/1     Running   0          4m4s
gpu-operator                        gpu-operator-node-feature-discovery-worker-tvzsh                  1/1     Running   0          4m4s
gpu-operator                        gpu-operator-node-feature-discovery-worker-wpv7m                  1/1     Running   0          3m34s
gpu-operator                        gpu-operator-node-feature-discovery-worker-znpw4                  1/1     Running   0          4m34s
karpenter                           karpenter-cbbd49b99-8g2fv                                         1/1     Running   0          12m
karpenter                           karpenter-cbbd49b99-xkgfz                                         1/1     Running   0          12m
kube-prometheus-stack               alertmanager-kube-prometheus-stack-alertmanager-0                 2/2     Running   0          9m12s
kube-prometheus-stack               kube-prometheus-stack-grafana-d477cd865-9rrdx                     3/3     Running   0          10m
kube-prometheus-stack               kube-prometheus-stack-kube-state-metrics-7cc6fcdf84-dvqlr         1/1     Running   0          10m
kube-prometheus-stack               kube-prometheus-stack-operator-66f4fb967-scwzf                    1/1     Running   0          10m
kube-prometheus-stack               kube-prometheus-stack-prometheus-node-exporter-642l2              1/1     Running   0          9m42s
kube-prometheus-stack               kube-prometheus-stack-prometheus-node-exporter-8hgbm              1/1     Running   0          9m43s
kube-prometheus-stack               kube-prometheus-stack-prometheus-node-exporter-8x6bg              1/1     Running   0          9m43s
kube-prometheus-stack               kube-prometheus-stack-prometheus-node-exporter-k69hn              1/1     Running   0          8m24s
kube-prometheus-stack               kube-prometheus-stack-prometheus-node-exporter-mmdjf              1/1     Running   0          9m44s
kube-prometheus-stack               kube-prometheus-stack-prometheus-node-exporter-pjrmj              1/1     Running   0          9m43s
kube-prometheus-stack               prometheus-kube-prometheus-stack-prometheus-0                     2/2     Running   0          9m12s
kube-system                         aws-for-fluent-bit-5nz7v                                          0/1     Pending   0          11m
kube-system                         aws-for-fluent-bit-fp5x5                                          1/1     Running   0          9m29s
kube-system                         aws-for-fluent-bit-njxl2                                          1/1     Running   0          9m30s
kube-system                         aws-for-fluent-bit-t8xj4                                          1/1     Running   0          9m28s
kube-system                         aws-for-fluent-bit-tzjz5                                          1/1     Running   0          9m29s
kube-system                         aws-for-fluent-bit-wpxqw                                          1/1     Running   0          9m28s
kube-system                         aws-load-balancer-controller-765c8f48c7-s5vbw                     1/1     Running   0          12m
kube-system                         aws-load-balancer-controller-765c8f48c7-z6d9x                     1/1     Running   0          12m
kube-system                         aws-node-4cs5z                                                    2/2     Running   0          7m50s
kube-system                         aws-node-c8tgl                                                    2/2     Running   0          7m36s
kube-system                         aws-node-f55zl                                                    2/2     Running   0          6m57s
kube-system                         aws-node-nwmbr                                                    2/2     Running   0          5m41s
kube-system                         aws-node-s4k8d                                                    2/2     Running   0          8m22s
kube-system                         aws-node-v247j                                                    2/2     Running   0          6m19s
kube-system                         cluster-autoscaler-aws-cluster-autoscaler-78f684744c-ktz8s        1/1     Running   0          12m
kube-system                         coredns-5fbf6db84-28lll                                           1/1     Running   0          8m24s
kube-system                         coredns-5fbf6db84-855pq                                           1/1     Running   0          8m24s
kube-system                         ebs-csi-controller-84b4dfdcb5-kvbsv                               6/6     Running   0          8m22s
kube-system                         ebs-csi-controller-84b4dfdcb5-rdgxz                               6/6     Running   0          8m22s
kube-system                         ebs-csi-node-8htjp                                                3/3     Running   0          8m22s
kube-system                         ebs-csi-node-9rfmr                                                3/3     Running   0          8m22s
kube-system                         ebs-csi-node-q584r                                                3/3     Running   0          8m22s
kube-system                         ebs-csi-node-qtt2j                                                3/3     Running   0          8m22s
kube-system                         ebs-csi-node-rd97g                                                3/3     Running   0          8m22s
kube-system                         ebs-csi-node-rg6nx                                                3/3     Running   0          8m22s
kube-system                         efs-csi-controller-559bddb74d-w4fg2                               3/3     Running   0          12m
kube-system                         efs-csi-controller-559bddb74d-wpj9k                               3/3     Running   0          12m
kube-system                         efs-csi-node-2hg2b                                                3/3     Running   0          9m44s
kube-system                         efs-csi-node-dqjkp                                                3/3     Running   0          9m43s
kube-system                         efs-csi-node-fkjkx                                                3/3     Running   0          9m43s
kube-system                         efs-csi-node-frnqc                                                3/3     Running   0          11m
kube-system                         efs-csi-node-p6fgs                                                3/3     Running   0          9m42s
kube-system                         efs-csi-node-wm9pz                                                3/3     Running   0          9m43s
kube-system                         eks-pod-identity-agent-7jn7t                                      1/1     Running   0          8m24s
kube-system                         eks-pod-identity-agent-jm6vg                                      1/1     Running   0          8m24s
kube-system                         eks-pod-identity-agent-lpvzf                                      1/1     Running   0          8m24s
kube-system                         eks-pod-identity-agent-qgn72                                      1/1     Running   0          8m24s
kube-system                         eks-pod-identity-agent-wbhgg                                      1/1     Running   0          8m24s
kube-system                         eks-pod-identity-agent-x5699                                      1/1     Running   0          8m24s
kube-system                         fsx-csi-controller-75c4d4f47c-qbgk7                               4/4     Running   0          12m
kube-system                         fsx-csi-controller-75c4d4f47c-zj5w6                               4/4     Running   0          12m
kube-system                         fsx-csi-node-7xrrk                                                3/3     Running   0          11m
kube-system                         fsx-csi-node-86jn7                                                3/3     Running   0          9m43s
kube-system                         fsx-csi-node-nbz6b                                                3/3     Running   0          9m43s
kube-system                         fsx-csi-node-q9t2j                                                3/3     Running   0          9m42s
kube-system                         fsx-csi-node-t4q9w                                                3/3     Running   0          9m44s
kube-system                         fsx-csi-node-t7t2q                                                3/3     Running   0          9m43s
kube-system                         kube-proxy-7qbbg                                                  1/1     Running   0          8m3s
kube-system                         kube-proxy-gpntl                                                  1/1     Running   0          8m17s
kube-system                         kube-proxy-kt7v9                                                  1/1     Running   0          8m7s
kube-system                         kube-proxy-mklqx                                                  1/1     Running   0          7m58s
kube-system                         kube-proxy-q7zrk                                                  1/1     Running   0          8m11s
kube-system                         kube-proxy-xcqzj                                                  1/1     Running   0          8m24s
kube-system                         metrics-server-59659dbfc9-8nvzl                                   1/1     Running   0          12m
kube-system                         secrets-store-csi-driver-fdxgc                                    3/3     Running   0          9m43s
kube-system                         secrets-store-csi-driver-nrfwl                                    3/3     Running   0          9m44s
kube-system                         secrets-store-csi-driver-provider-aws-2ctrz                       1/1     Running   0          9m28s
kube-system                         secrets-store-csi-driver-provider-aws-8lj4m                       1/1     Running   0          11m
kube-system                         secrets-store-csi-driver-provider-aws-d67z4                       1/1     Running   0          9m29s
kube-system                         secrets-store-csi-driver-provider-aws-gknfj                       1/1     Running   0          9m29s
kube-system                         secrets-store-csi-driver-provider-aws-rkllz                       1/1     Running   0          9m28s
kube-system                         secrets-store-csi-driver-provider-aws-v958r                       1/1     Running   0          9m30s
kube-system                         secrets-store-csi-driver-pxb8n                                    0/3     Pending   0          10m
kube-system                         secrets-store-csi-driver-txkqw                                    3/3     Running   0          9m44s
kube-system                         secrets-store-csi-driver-zht55                                    3/3     Running   0          9m43s
kube-system                         secrets-store-csi-driver-zn6pj                                    3/3     Running   0          9m43s
prometheus-adapter                  prometheus-adapter-c4647d9ff-gvds4                                1/1     Running   0          4m35s
prometheus-adapter                  prometheus-adapter-c4647d9ff-k6s78                                1/1     Running   0          4m5s
velero                              velero-bbd6b4d4f-w4lmr                                            1/1     Running   0          8m30s
vpa                                 vpa-admission-controller-947bb6566-mhw57                          1/1     Running   0          7m41s
vpa                                 vpa-recommender-7ff6dd9c8b-q8xdf                                  1/1     Running   0          7m41s
vpa                                 vpa-updater-79d776c5dc-tqhc6                                      1/1     Running   0          7m41s

Click to see logs of changed services
─[$] k logs -n kube-system  aws-node-4cs5z --all-containers                                                                                     [17:21:15]
E0831 17:21:21.577160   94276 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://<REDACTED>.gr7.us-west-2.eks.amazonaws.com/api?timeout=32s\": dial tcp 52.42.0.54:443: connect: network is unreachable"
time="2025-09-01T00:13:31Z" level=info msg="Copying CNI plugin binaries ..."
time="2025-09-01T00:13:31Z" level=info msg="Copied all CNI plugin binaries to /host/opt/cni/bin"
time="2025-09-01T00:13:31Z" level=info msg="Found primaryMAC 06:23:ad:5a:09:5d"
time="2025-09-01T00:13:31Z" level=info msg="Found primaryIF ens5"
time="2025-09-01T00:13:31Z" level=info msg="Updated net/ipv4/conf/ens5/rp_filter to 2\n"
time="2025-09-01T00:13:31Z" level=info msg="Updated net/ipv4/tcp_early_demux to 1\n"
time="2025-09-01T00:13:31Z" level=info msg="CNI init container done"
Installed /host/opt/cni/bin/aws-cni
Installed /host/opt/cni/bin/egress-cni
time="2025-09-01T00:13:34Z" level=info msg="Starting IPAM daemon... "
time="2025-09-01T00:13:34Z" level=info msg="Checking for IPAM connectivity... "
time="2025-09-01T00:13:38Z" level=info msg="Copying config file... "
time="2025-09-01T00:13:38Z" level=info msg="Successfully copied CNI plugin binary and config file."

└─[$] k logs -n kube-system  eks-pod-identity-agent-7jn7t                                                                                        [17:21:21]
E0831 17:21:29.836749   94376 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://<REDACTED>.gr7.us-west-2.eks.amazonaws.com/api?timeout=32s\": dial tcp 34.212.234.234:443: connect: network is unreachable"
Defaulted container "eks-pod-identity-agent" out of: eks-pod-identity-agent, eks-pod-identity-agent-init (init)
2025/09/01 00:12:53 Running command:
Command env: (log-file=, also-stdout=false, redirect-stderr=true)
Run from directory: 
Executable path: /eks-pod-identity-agent
Args (comma-delimited): /eks-pod-identity-agent,server,--port,80,--cluster-name,test-env,--probe-port,2703
2025/09/01 00:12:53 Now listening for interrupts
2025/09/01 00:12:53 Setting logging verbosity level to: info (4)
{"bind-addr":"169.254.170.23:80","level":"info","msg":"Starting server...","time":"2025-09-01T00:12:53Z"}
{"bind-addr":"[fd00:ec2::23]:80","level":"info","msg":"Starting server...","time":"2025-09-01T00:12:53Z"}
{"bind-addr":"0.0.0.0:2705","level":"info","msg":"Starting server...","time":"2025-09-01T00:12:53Z"}
{"bind-addr":"localhost:2703","level":"info","msg":"Starting server...","time":"2025-09-01T00:12:53Z"}
{"client-addr":"169.254.170.23:44948","cluster-name":"test-env","level":"info","msg":"handling new request request from 169.254.170.23:44948","time":"2025-09-01T00:13:45Z"}
{"client-addr":"169.254.170.23:44948","cluster-name":"test-env","level":"info","msg":"Calling EKS Auth to fetch credentials","time":"2025-09-01T00:13:45Z"}
{"client-addr":"169.254.170.23:44948","cluster-name":"test-env","fetched_role_arn":"arn:aws:sts::<REDACTED>:assumed-role/aws-vpc-cni-ipv4-20250831235900892400000001/eks-test-env-aws-node-c-da8011c5-04eb-4b95-827c-7ba13507d85d","fetched_role_id":"AROATSO57MX6NTW4NWMJD:eks-test-env-aws-node-c-da8011c5-04eb-4b95-827c-7ba13507d85d","level":"info","msg":"Successfully fetched credentials from EKS Auth","request_time_ms":119,"time":"2025-09-01T00:13:45Z"}
{"client-addr":"169.254.170.23:44948","cluster-name":"test-env","level":"info","msg":"Storing creds in cache","refreshTtl":10800000000000,"time":"2025-09-01T00:13:45Z"}

There was no issues seen during the terraform destroy of the resources.

service_account_role_arn = try(each.value.service_account_role_arn, null)

dynamic "pod_identity_association" {
for_each = try(each.value.pod_identity_association, [])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a list appropriate? An addon can only be associated to a single pod identity, right? Maybe an object would be better?

Suggested change
for_each = try(each.value.pod_identity_association, [])
for_each = try(each.value.pod_identity_association, null) != null ? [each.value.pod_identity_association] : []

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @lorengordon ,

Thank you for your time in reviewing this PR, as well as your feedback. Let me explain why I think keeping this a list would be best for forward maintainability.

An addon can only be associated to a single pod identity, right?

It is accurate to say today there is a 1-1 relationship between AWS add-on and IAM role (and subsequent Pod Identity association). This is due to the current add-on state where each add-on has a single service account which needs IAM access. I verified this by reviewing both AWS documentation on add-ons and review of the pre-baked pod identities under terraform-aws-modules.

However, it is technically possible for future add-ons to have multiple IAM roles/Pod Identities associated with them. This is because the Pod Identity relationship is 1-1 between IAM role and service-account. A future hypothetical example of a 1-M add-on to Pod Identity relationship follows. Imagine an add-on which combined the various observability stacks into a single, cooperative stack, i.e. a combination of amazon-managed-service-prometheus, aws-cloudwatch-observability, otel, etc. In this situation to maintain least privilege RBAC, the add-on would need at least 2 different IAM role -> service-account Pod Identity mappings, 1 for prometheus service-account, a separate one for observablity service-account, etc.

Now granted this is entirely hypothetical, and may end up being YAGNI. In my opinion, since the option technically exists, I recommend to make this a list, so if the module needs to support it later we don't have to make a breaking change to the input and potential associated major release of the module. All that being said I am not a maintainer of the project, and defer this kind of maintainability decision to y'all; I am cool either way.

Given this context above, please let me know if you wish to proceed with the interface change; I would not mind making the change and re-running tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, the api doc would appear to agree with you. It accepts an array instead of an object...

https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateAddon.html#API_CreateAddon_RequestSyntax

Fyi, I'm not a maintainer here, I was just interested in this same feature.

@github-actions
Copy link

github-actions bot commented Oct 4, 2025

This PR has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Oct 4, 2025
@lorengordon
Copy link

Not stale

@github-actions github-actions bot removed the stale label Oct 5, 2025
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

This PR has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Nov 5, 2025
@lorengordon
Copy link

Still not stale

@github-actions github-actions bot removed the stale label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for optional pod_identity_association for aws_eks_addon

2 participants