kubectl auth can-i is the fastest RBAC smoke test I was missing
I used to test Kubernetes permissions the slow way.
Spin up a pod, run a controller, wait for it to fail, inspect logs, guess which RBAC rule is missing, patch, retry.
That approach works, but it burns time and produces noisy failures in the middle of incidents.
The faster path is kubectl auth can-i.
Why this is better during incident response
can-i asks the API server directly whether a given identity can perform a specific action on a specific resource.
No rollout required. No crash loop required. No speculative pod required.
You can impersonate service accounts with --as=system:serviceaccount:<namespace>:<name> and check the exact verb/resource scope your controller needs.
Example:
kubectl auth can-i \
list workflows.argoproj.io \
--as=system:serviceaccount:kubeflow:argo \
--all-namespaces \
--context mlinfra-prod
If that returns no, you have a deterministic RBAC blocker before you touch deployments.
Where this helped immediately
In a recent Kubeflow outage, two controllers were crash looping with RBAC forbidden errors.
Instead of continuing log archaeology, we validated permissions directly:
kubeflow/argoworkflow accesskserve-controller-managerinference service access
That made drift obvious: role bindings were pointing at different service account namespace placements across clusters.
We aligned bindings, re-ran can-i, then restarted controllers.
Practical smoke test pattern
I now keep a small set of can-i checks for critical controllers and run them after:
- cluster upgrades
- RBAC changes
- controller namespace moves
- restore/reconcile operations
I published the script we now use:
kubeflow-rbac-smoke.sh: https://gist.github.com/fizz/2e64204a5fd8767ced6a4ac247aa4b5f
You can keep this lightweight and still get strong signal. Seven focused checks caught the exact class of drift that had already cost us production time.
If the question is “will this controller be allowed to start cleanly,” can-i should be your first command, not your last resort.