fizz.today

Three layers of indirection: how API Gateway hides your Lambda version pin

I knew the Lambda was invoking Version 1 instead of $LATEST. I knew Version 1 was frozen without VPC config. What I didn’t know was where the version number lived. There was no architecture doc. There was no one to ask. There was a running cluster and a database I could reach through an ephemeral pod.

The Lambda function name wasn’t hardcoded anywhere I could find. The inference pipeline stores an API Gateway URL in a Postgres deployment table — just a URL, no ARN, no version qualifier. I spun up a throwaway psql container in the same VPC and queried the table expecting to find :1 somewhere in the record. Nothing. Clean URL.

So I pulled the API Gateway integration:

aws apigateway get-integration \
  --rest-api-id abc123 --resource-id xyz789 \
  --http-method ANY --query 'uri'
arn:aws:lambda:...:function:my-func:1/invocations

There it is. :1, baked into the integration URI when the deployment pipeline created the API Gateway. The API Gateway name had it too — dev-lambda-my-func:1-API. The tags had it — deployment: dev-lambda-my-func:1. Written in three places, visible in none of the usual diagnostic paths.

Postgres  →  https://abc123.execute-api.us-east-1.amazonaws.com/dev
                          ↓
API GW    →  arn:aws:lambda:...:function:my-func:1/invocations
                          ↓
Lambda    →  Version 1 (frozen config, no VPC, ConnectTimeoutError)

Then I found the second invocation path. I didn’t have the source repo, but the pods were running, so I kubectl exec’d into the inference container and read the code off disk. The app has a url2arn() function that reads the API Gateway integration at runtime, splits the URI, and extracts the Lambda ARN for direct boto3 invocations:

arn = integration["uri"].split("/")[-2]
# "arn:aws:lambda:us-east-1:123456789:function:my-func:1"

The :1 qualifier comes along for the ride. So the stale version poisons both the HTTP path (client → API Gateway → Lambda:1) and the direct-invoke path (client → url2arn() → boto3 → Lambda:1). Neither path stores the version itself — both inherit it from the integration URI.

update-function-configuration doesn’t touch API Gateway. publish-version doesn’t touch API Gateway. The version pin survives every Lambda operation and is invisible until you specifically call get-integration.

Updating the integration from :1 to :2 took three commands: update-integration, add-permission on the new version (published versions don’t inherit resource policies), create-deployment to push the stage. 30 seconds to apply, 6 hours to find.

While I was reading code off that container, I found a scale_lambda() function that creates aliases for provisioned concurrency — 50 lines away from the code that created the integration. If the deployment path used my-func:live instead of my-func:1, the API Gateway integration would survive every configuration change without anyone touching it. update-alias is one command. The pattern was right there. Nobody had wired it in.

#aws #lambda #api-gateway #debugging