Using AWS Lambda in VPC with S3

Using AWS Lambda in VPC with S3
Using AWS Lambda in VPC with S3
💡
Download the free ebook on AWS Lambda here

In previous article, we've discussed about how to use AWS Lambda to interact with S3 (read, write, triggers, generate presigned url etc..). If you want to understand the basics of using lambda with S3, you can read that article first.

In this article, we're going to discuss various ways to connect to AWS S3 from your lambda when your lambda is in private subnet of your VPC.

Specifically, we're going to discuss on how to run lambda inside VPC private subnet and access S3

  • Using NAT Gateway
  • Using Gateway Endpoint
  • Using Interface Endpoint

How to use NAT Gateway to interact with S3 from Lambda

Let's consider a scenario where you've lambda function in private subnet in a VPC and you want to interact with AWS S3 service from that lambda. There is a small problem with this.

AWS S3 is a public service which means this service can be accessed by using public endpoint (although you require necessary permissions to perform any operation). But the lambda is in VPC which doesn't have any access to public internet by default.

In order to access the S3 from lambda, we can use NAT Gateway to get access to internet. Internally, NAT gateway routes the traffic to public internet through Internet Gateway.

Below is the architecture diagram of the same

Lambda (in Private VPC) interacting with S3 using NAT Gateway
Lambda (in Private VPC) interacting with S3 using NAT Gateway

Infrastructure code

As part of infrastructure code, we would be creating VPC, bucket and lambda.

VPC

In the below aws cdk code snippet, we're creating VPC with 2 subnets  - one is private subnet and other is public subnet. Please note that we've created subnet of type PRIVATE_WITH_EGRESS - which means only outbound connections will be allowed and no connection can be initiated from public internet to this subnet.

const vpc = new ec2.Vpc(this, 'VpcLambda', {
      maxAzs: 2,
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: 'privatelambda',
          subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
        },
        {
          cidrMask: 24,
          name: 'public',
          subnetType: ec2.SubnetType.PUBLIC,
        },
      ],
    });

Bucket

Below is the code snippet for creating bucket. In non-production environments, we want contents of the bucket to be destroyed when we bring down the stack. We use appropriate removal policy based on the environment variable.

    const bucketId = 'vpc-nat-gw';
    const isProd = process.env.isProd ?? false;
    const isDev = !isProd;
    const removalPolicy = isDev ? RemovalPolicy.DESTROY : RemovalPolicy.RETAIN;

 const bucket = new s3.Bucket(this, 'S3Bucket', {
      bucketName: `aws-lambda-s3-${bucketId}`,
      autoDeleteObjects: isDev,
      removalPolicy,
    });

Lambda function

The properties of lambda function is similar to what we've done earlier. The only difference is that we're choosing the subnets here to host our lambda. In the last couple of lines, we're choosing subnet based on subnet type.

const nodeJsFunctionProps: NodejsFunctionProps = {
      bundling: {
        externalModules: [
          'aws-sdk', // Use the 'aws-sdk' available in the Lambda runtime
        ],
      },
      runtime: Runtime.NODEJS_16_X,
      timeout: Duration.minutes(3), // Default is 3 seconds
      memorySize: 256,
    };

    const readS3ObjVpcFn = new NodejsFunction(this, 'readS3ObjVpc', {
      entry: path.join(__dirname, '../src/lambdas', 'read-s3-obj-vpc.ts'),
      ...nodeJsFunctionProps,
      functionName: 'readS3ObjVpc',
      environment: {
        bucketName: bucket.bucketName,
      },
      vpc,
      vpcSubnets: vpc.selectSubnets({
        subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
      }),
    });

bucket.grantRead(readS3ObjVpcFn);

Event Notification

We want the lambda to be invoked automatically when user uploads an object to S3 bucket. To achieve that, we need to add event source to the lambda function

readS3ObjVpcFn.addEventSource(
      new S3EventSource(bucket, {
        events: [s3.EventType.OBJECT_CREATED],
      })
    );

Lambda function source code

Lambda function code is pretty simple. We just get metadata from S3 event and read contents of S3 object and then print it in console (which can be read through cloudwatch)

import { S3Event } from 'aws-lambda';
import * as AWS from 'aws-sdk';

export const handler = async (
  event: S3Event,
  context: any = {}
): Promise<any> => {
  for (const record of event.Records) {
    const bucketName = record?.s3?.bucket?.name || '';
    const objectKey = record?.s3?.object?.key || '';

    const s3 = new AWS.S3();
    const params = { Bucket: bucketName, Key: objectKey };
    const response = await s3.getObject(params).promise();
    const data = response.Body?.toString('utf-8') || '';
    console.log('file contents:', data);
  }
};

Testing

You can upload a simple text file in the bucket and lambda will be invoked by notification. You can view the execution logs in cloudwatch service.

cloudwatch logs vpc

Pros and Cons:

Pros:

  • Architecture is simple and pretty straight forward to understand
  • If your lambda function accesses any public APIs apart from S3, that would work too.

Cons:

  • Managed NAT Gateway is bit costly. You'll be charged 45 cents per hour for every hour and 45 cents per hour for every GB processed (at the time of this writing). Prices may vary based on region. Please visit AWS pricing for latest pricing
  • NAT Gateways resilient to availability zone - which means each availability zone should its own NAT Gateway to have high availability. This will increase the costs further.

How to use Gateway Endpoint to interact with S3 from Lambda

Instead of NAT Gateway, we can use Gateway Endpoint to access S3 service. Gateway endpoint will update the Route Table of the subnet with prefix list so that VPC router can route to S3 service without going to the public internet.

Below is the high level architecture diagram

Infrastructure

VPC

In previous section, we've used PRIVATE_WITH_EGRESS subnet - which would create NAT Gateway. In this section, we're going to PRIVATE_ISOLATED - which means it is completely private - no outbound or inbound connections to/from the public internet.

We also create gatewayEndpoints for S3 when creating VPC

const vpc = new ec2.Vpc(this, 'VpcLambda', {
      maxAzs: 2,
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: 'privatelambda',
          subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
        },
      ],
      gatewayEndpoints: {
        S3: {
          service: ec2.GatewayVpcEndpointAwsService.S3,
        },
      },
    });

Lambda function

When creating lambda function - you can specify the private subnet ( as shown in below code snippet)

 const readS3ObjVpcFn = new NodejsFunction(this, 'readS3ObjVpc', {
      entry: path.join(__dirname, '../src/lambdas', 'read-s3-obj-vpc.ts'),
      ...nodeJsFunctionProps,
      functionName: 'readS3ObjVpc',
      environment: {
        bucketName: bucket.bucketName,
      },
      vpc,
      vpcSubnets: vpc.selectSubnets({
        subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
      }),
    });

And, there will be no change to the function source code.

Pros & Cons

Pros:

  • No NAT Gateways and hence brings down the total cost
  • Gateway endpoints are free
  • Your network traffic remains with in AWS network

Cons:

  • No public internet - if your private subnet lambda accesses any public APIs/endpoints - you'll not be able to do that
  • Doesn't allow from on premise network.

How to use Interface Endpoint to interact with S3 from Lambda

Interface endpoint can also connect to AWS service(S3 in this case) without using NAT Gateway just like Gateway endpoint. But the way it is implemented is different.

Interface endpoint uses PrivateLink . The primary purpose of private link is to establish private connectivity between VPC and AWS services and it does so by creating tunnel.

Below is the high level architecture of using interface endpoint. When you create interface endpoint, it creates Elastic Network Interface(ENI) in specified subnet. When we use interface endpoint, we would not be able to use the default public endpoint for S3. We've to use different endpoint provided by interface endpoint as it uses Private link under the hood.

Infrastructure

VPC

We're creating VPC with only one PRIVATE_ISOLATED subnet - this means no outbound or inbound connection by default.

    const vpc = new ec2.Vpc(this, 'VpcLambda', {
      maxAzs: 1,
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: 'privatelambda',
          subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
        },
      ],
    });

Interface endpoint

Below code is used for creating interface endpoint for S3 service and we've specified the subnet in which interface endpoint has to be created. At the time of this writing, Amazon S3 interface endpoints do not support private DNS feature and hence we have set false to privateDnsEnabled property

const interfaceEndpoint = new ec2.InterfaceVpcEndpoint(
      this,
      'VPC S3 Interface Endpoint',
      {
        vpc,
        service: new ec2.InterfaceVpcEndpointAwsService('s3'),
        subnets: vpc.selectSubnets({
          subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
        }),
        privateDnsEnabled: false,
      }
    );

Primary Dns

Interface endpoint provides regional and zonal endpoint entries. The difference between regional and zonal dns endpoint is that zonal endpoint would have availability zone information too in addition to AWS region, VPC endpoint ID etc..

Below is one such example of dns endpoints.

We just want the regional endpoint DNS name. Below code is used for getting the same.

    const firstEntry = cdk.Fn.select(
      0,
      interfaceEndpoint.vpcEndpointDnsEntries
    );
    const entryParts = cdk.Fn.split(':', firstEntry);
    const primaryDNSName = cdk.Fn.select(1, entryParts);

Lambda function properties

We just need to specify the subnet when creating the lambda

const readS3ObjVpcFn = new NodejsFunction(this, 'readS3ObjVpc', {
      entry: path.join(__dirname, '../src/lambdas', 'read-s3-obj-vpc.ts'),
      ...nodeJsFunctionProps,
      functionName: 'readS3ObjVpc',
      environment: {
        bucketName: bucket.bucketName,
        endpoint: primaryDNSName,
      },
      vpc,
      vpcSubnets: vpc.selectSubnets({
        subnetType: ec2.SubnetType.PRIVATE_ISOLATED,
      }),
    });

Lambda function source code

As mentioned earlier, you would not be able to use S3 public endpoint when you use interface endpoint. From the DNS that you get from interface endpoint, you can replace * with bucket to create S3 endpoint

For example, the primary DNS name that we got from previous step would be .vpce-1a2b3c4d-5e6f.s3.us-east-1.vpce.amazonaws.com . Just replace the * with the word bucket. This would be DNS endpoint for connecting to S3 using interface endpoint ( which in turn uses PrivateLink)

The important thing to note when writing lambda is that you need to use this endpoint when creating S3 client. Otherwise, your lambda task will be timed out after sometime as it wouldn't know how to connect to S3 service.

export const handler = async (
  event: S3Event,
  context: any = {}
): Promise<any> => {
  const primaryDns = process.env.endpoint || '';
  const endpoint = primaryDns.replace('*', 'bucket');

  for (const record of event.Records) {
    const bucketName = record?.s3?.bucket?.name || '';
    const objectKey = record?.s3?.object?.key || '';

    const s3 = new AWS.S3({
      region: 'us-east-1',
      endpoint,
    });
    const params = { Bucket: bucketName, Key: objectKey };
    const response = await s3.getObject(params).promise();
    const data = response.Body?.toString('utf-8') || '';
    console.log('file contents(using interface):', data);
  }
};

Pros and Cons

Pros:

  • You can connect to AWS service without using public internet
  • Allows access from your on-premise network

Cons:

  • You're charged for every hour (per AZ ) for using PrivateLink (which interface endpoint uses under the hood)
  • You have to change your lambda code to use the endpoint specific DNS name

Please let me know your thoughts in the comments section.