fizz.today

kubectl auth can-i is the fastest RBAC smoke test I was missing

I used to test Kubernetes permissions the slow way.

Spin up a pod, run a controller, wait for it to fail, inspect logs, guess which RBAC rule is missing, patch, retry.

That approach works, but it burns time and produces noisy failures in the middle of incidents.

The faster path is kubectl auth can-i.

Why this is better during incident response

can-i asks the API server directly whether a given identity can perform a specific action on a specific resource.

No rollout required. No crash loop required. No speculative pod required.

You can impersonate service accounts with --as=system:serviceaccount:<namespace>:<name> and check the exact verb/resource scope your controller needs.

Example:

kubectl auth can-i \
  list workflows.argoproj.io \
  --as=system:serviceaccount:kubeflow:argo \
  --all-namespaces \
  --context mlinfra-prod

If that returns no, you have a deterministic RBAC blocker before you touch deployments.

Where this helped immediately

In a recent Kubeflow outage, two controllers were crash looping with RBAC forbidden errors.

Instead of continuing log archaeology, we validated permissions directly:

That made drift obvious: role bindings were pointing at different service account namespace placements across clusters.

We aligned bindings, re-ran can-i, then restarted controllers.

Practical smoke test pattern

I now keep a small set of can-i checks for critical controllers and run them after:

I published the script we now use:

You can keep this lightweight and still get strong signal. Seven focused checks caught the exact class of drift that had already cost us production time.

If the question is “will this controller be allowed to start cleanly,” can-i should be your first command, not your last resort.

#kubernetes #rbac #kubectl #kubeflow