Owner chain explains lineage. Reconciler chain explains behavior.

2026-02-27

I burned a Friday morning editing a Deployment image that kept reverting.

The old image was gcr.io/ml-pipeline/frontend:2.0.5. The target was ghcr.io/kubeflow/kfp-frontend:2.5.0. The edit applied cleanly, then snapped back. Repeatedly.

The first useful clue came from ownership:

Deployment ownerRef: kind: Namespace, name: admin

That looked absurd against my default Kubernetes mental model. I treat namespace as a container, not an active parent object. But in this cluster, namespace was the parent in a metacontroller flow that rendered child resources.

That gave me the principle I wish I had earlier:

Owner chain explains lineage. Reconciler chain explains behavior.

Why owner chain was insufficient

ReplicaSet -> Deployment -> Namespace told me where the object sat in hierarchy. It did not identify all actors that would overwrite the object.

A controller can reconcile an object even when it is not the immediate owner in the way you expect. If you only follow ownerRefs, you can still miss the actual source of desired state.

In practice, that means a perfect kubectl edit deployment can still be the wrong move.

Fast debugging checklist

Follow ownerRefs to understand lineage.
Inspect managedFields and controller annotations to identify writers.
Find controller inputs (ConfigMap, CR, Helm values), not just child specs.
Validate service-account permissions with kubectl auth can-i.

Example:

kubectl --context mlinfra-prod auth can-i \
  list workflows.argoproj.io --all-namespaces \
  --as=system:serviceaccount:kubeflow:argo

That command answers in one line what I used to test with ad hoc pods and trial-and-error.

The config trap that looked like metadata

In this stack, pipeline-install-config.appVersion was not informational. It fed KFP_VERSION into the profile-controller path. If explicit image env overrides were missing, reconciliation could drift child image tags back toward old defaults.

So we had mixed state:

leaf runtime image already at 2.5.0 in some places
controller config still carrying 2.0.5 intent in others

That is how incidents feel random when they are actually deterministic.

Durable pattern

Edit controller inputs, not reconciled children.
Align config intent (appVersion, explicit image overrides).
Trigger reconcile from the parent model when needed.
Add repeatable health checks so restarts don’t rediscover drift.

I published the two scripts we now run for that:

kubeflow-version-snapshot.sh: https://gist.github.com/fizz/307ce198f24c78b55a721f80971e491e
kubeflow-rbac-smoke.sh: https://gist.github.com/fizz/2e64204a5fd8767ced6a4ac247aa4b5f

If a controller manages it, child edits are local anesthesia. The surgery is at source-of-truth.

#kubernetes #kubeflow #metacontroller #rbac