Silent exit 0 is worse than a crash — when scripts succeed with wrong inputs
I’m running the tenant onboarding populate scripts to seed 38 SSM parameters for a new tenant. Five flags: --envfile, --tenant, --orgcode, --region, --appname. I swap --tenant (should be tenant-ridgeb-7ebfdce288) and --orgcode (should be org_156f18133ee2f). Script exits 0, prints its output banner, writes all 38 params. On another run I use --appname tenant_auth_service when it should be apiserver. Same result — exit 0, banner, params written.
I don’t find out until External Secrets Operator starts failing reconciliation. Every ExternalSecret reports Secret does not exist. The params are in SSM, just under paths that nothing is looking for.
Why it works this way
The Python script builds the SSM path prefix from raw flag values:
prefix = f"/platform/dev/tenants/{args.tenant}/{args.appname}/"
for key, value in env_vars.items():
param_name = f"{prefix}{key}"
ssm.put_parameter(
Name=param_name,
Value=param_value,
Type="SecureString",
Overwrite=True,
)
ssm.put_parameter writes to any path. There’s no format validation on --tenant, no enum check on --appname, no verification that the path prefix matches an existing tenant namespace. The flags are string concatenation inputs, nothing more.
The shell wrappers — populate_api_server.sh and populate_tenant_auth.sh — compound this. They reference ./env_api_server with a relative path, so running them from any directory other than their own reads an empty or nonexistent envfile, writes zero params, and exits 0. No params written, no error, no indication anything went wrong.
What silent success costs
A script that crashes on bad input is annoying for 30 seconds. You see the traceback, fix the flag, re-run. A script that silently succeeds with bad input costs hours — you don’t know anything is wrong until a downstream system fails, and then you’re debugging the downstream system, not the script that actually broke things.
My debugging path: ESO reconciliation failure, then checking the ExternalSecret status, then checking the ClusterSecretStore, then checking IAM permissions, then finally running aws ssm get-parameters-by-path and seeing the params sitting under /platform/dev/tenants/org_156f18133ee2f/ instead of /platform/dev/tenants/tenant-ridgeb-7ebfdce288/. That’s four wrong hypotheses before reaching the actual cause.
What one line of validation buys you
import re
TENANT_RE = re.compile(r"^tenant-[a-z]+-[0-9a-f]{10}$")
VALID_APPNAMES = {"apiserver", "tenant_auth_service"}
if not TENANT_RE.match(args.tenant):
sys.exit(f"Invalid --tenant format: {args.tenant!r}")
if args.appname not in VALID_APPNAMES:
sys.exit(f"Invalid --appname: {args.appname!r} (expected one of {VALID_APPNAMES})")
Two checks, four lines. The swapped-flags run dies immediately with a clear message instead of writing 38 params to a path nobody will ever read.
The relative path fix is even simpler:
SCRIPT_DIR="$(dirname "$(readlink -f "$0")")"
envfile="${SCRIPT_DIR}/env_api_server"
Resolve from the script’s own location, not the caller’s $PWD. The caller stays wherever they are — the script reaches out to its own files. No silent empty read, no zero-params-written exit 0.
The principle
Exit 0 is a contract. It means “I did what you asked and it worked.” A script that exits 0 after writing data to the wrong location has violated that contract in the most expensive way possible — it told you everything is fine while creating a problem you won’t discover until something else breaks. Format validation on inputs that become path components is not defensive programming. It’s the minimum bar for a script that claims success.