fizz.today

Your Lambda VPC migration worked. Your published versions didn’t.

I moved four Lambda functions into a VPC on Monday. update-function-configuration with the new subnets and security group, LastUpdateStatus: Successful, quick aws lambda invoke to confirm — all clean. Tuesday morning, 69 ConnectTimeoutErrors in CloudWatch.

The signal was in the START lines. Every Lambda invocation logs which version it ran:

START RequestId: abc-123 Version: 1

Not $LATEST. Version 1. Published three days before the VPC migration.

Published Lambda versions are immutable snapshots. update-function-configuration modifies $LATEST and nothing else. Version 1 still had the pre-migration configuration — no VPC, no subnets, no security group. The production caller was invoking :1, so my VPC changes didn’t exist for it.

aws lambda get-function-configuration \
  --function-name my-func --query 'VpcConfig.VpcId'
# "vpc-091a341400414b680"

aws lambda get-function-configuration \
  --function-name my-func --qualifier 1 --query 'VpcConfig.VpcId'
# ""

My verification test invoked $LATEST because that’s what aws lambda invoke does by default. So does the console’s “Test” button. I tested the wrong version and called the migration done.

publish-version creates a new immutable snapshot from $LATEST. After publishing Version 2, both versions exist side by side — but only Version 2 carries the VPC config. Anything still pointing at :1 keeps hitting the frozen pre-migration state.

The subtler gotcha: published versions don’t inherit resource policies. I published Version 2, updated the API Gateway integration to point at it, and got 500s. API Gateway couldn’t invoke Version 2 because the lambda:InvokeFunction permission only existed on $LATEST and Version 1. A separate add-permission --qualifier 2 was required. The API doesn’t warn you about this — publish-version succeeds, and the permission gap is silent until a caller tries.

After updating the integration, granting the permission, and redeploying the stage: Version: 2 in CloudWatch, zero timeouts, 0.3s warm responses.

The mistake was assuming update-function-configuration meant “update the function.” It means “update $LATEST.” If published versions exist, and if anything invokes by qualifier, you haven’t updated what your callers actually run.

After any update-function-configuration:

aws lambda list-versions-by-function --function-name my-func \
  --query 'Versions[?Version!=`$LATEST`].Version'

If that returns anything, publish-version. Then check your aliases, your API Gateway integrations, and whatever else holds a version qualifier. Test the same path your production caller uses — not $LATEST.

#aws #lambda #vpc #debugging