How to run docker containers in AWS Lambda - along with CI/CD pipeline - Python version
In this article, you're going to learn about how to run docker containers in AWS Lambda using Python. Before discussing how to do that, let us discuss the need to run the lambda using docker.
The typescript version of this article is available here
Sometimes, your lambda function may require a huge number of dependencies or dependencies with large sizes. For example, you may want some large packages for machine learning. As you may know, the maximum size of packaged lambda is 250 MB. But if your package exceeds that limit, you can make use of docker containers as the maximum size of a docker image is 10GB.
Even though you're running docker containers in lambda, still you need to implement lambda API and the restriction of a maximum of 15 minutes of the execution time of lambda still applies.
Architecture
Below is the high-level architecture diagram of the simple application that we're going to build in this article
Lambda function code: This is where your application logic resides. We'll be creating DockerFile
in this repository so that we can package the code as a docker image
GitHub CI/CD Pipeline: Whenever code is pushed into the remote branch, we want the GitHub workflow to run which in turn will build and push the image to the AWS ECR repository
Lambda function: This lambda function will refer to the image in the ECR repository
Eventbridge Rule: We'll create an event bridge rule to create a schedule so that lambda can be triggered periodically based on the schedule.
Now, let's create an actual application.
Lambda with Docker
We're going to use python in this example. Create a new folder and initialize the project
mkdir aws-lambda-docker
cd aws-lambda-docker
Lambda function
I'm keeping my lambda functions pretty simple with a couple of log statements. In the real world, you'll be using a huge number of dependencies (or dependencies with large sizes), if you're using docker in lambda.
import boto3
import pandas as pd
def handler(event, context):
print(event)
data_set = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
tab_data = pd.DataFrame(data_set)
print(tab_data)
Dockerfile
We've created a lambda function. Now, it is time to create Dockerfile
so that we can create a docker image of this lambda function that we wrote earlier.
One of the requirements of using docker with lambda is that the base image that you use must implement the runtime interface client to manage the interaction between lambda and function code.
AWS provides many base images which already implement the runtime API that you can use. One such image is public.ecr.aws/lambda/python:3.8
Based on your language/framework - AWS provides many such base images here
Below is the Dockerfile
for lambda.
FROM public.ecr.aws/lambda/python:3.8
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
COPY app.py ${LAMBDA_TASK_ROOT}
CMD [ "app.handler" ]
We're using the base image (which implements the Runtime API) provided by lambda and we copy the requirements.txt.
LAMBDA_TASK_ROOT
is a reserved environment variable used by AWS lambda that contains the path of the lambda function code. As we want to install the dependencies to this path, we're using this environment variable.
And, finally, we use the start command.
Infrastructure
As mentioned earlier, even though you use Docker with lambda, Lambda still works in the same way as earlier. Lambda will be executed as a result of some event - This event could be uploading the file to S3 or someone firing an HTTP event on API Gateway or a message to an SQS or from SNS and so on.
To make things simple, we're going to use EventBridge to create a schedule so that this lambda can be triggered periodically.
We're going to create a simple CDK project. This project will have 2 stacks - one for creating ECR Repository (let's call this RepoStack
) and another (let's call this LambdaStack
) for creating an event schedule.
The reason why we're creating 2 stacks is because of the dependency
- The first stack is for creating ECR Repository
- The second stack is for creating lambda function with the ECR image (to be provided by CI/CD pipeline) and for creating event schedule
Even though we create 2 stacks in the CDK project, we'll deploy the first stack and then build CI/CD pipeline so that image can be pushed to the ECR repository, and then finally we'll deploy the second stack.
mkdir aws-cdk-lambda-docker
cd acdk init app --language=python
Stack for creating ECR repository:
Below is the CDK code for creating the ECR repository.
class AwsCdkEcrStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
repo = ecr.Repository(self, "docker-lambda-py",
repository_name="docker-lambda-py")
This is a pretty simple code where we just have to mention the name of the repository through the property repository_name
for the construct
Stack for creating Lambda and event schedule:
Below is the code for creating lambda and event schedule to trigger the lambda
class AwsCdkLambdaDockerStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
repo = ecr.Repository.from_repository_name(
self, "Repo", "docker-lambda-py")
lambda_fn = _lambda.DockerImageFunction(
self, "docker-lambda-fn", code=_lambda.DockerImageCode.from_ecr(repo), function_name="docker-lambda-py-fn")
every_min_rule = events.Rule(self, 'every_min_rule',
schedule=events.Schedule.expression('rate(1 minute)'))
every_min_rule.add_target(events_targets.LambdaFunction(lambda_fn))
First, we're referring to the repo that we created earlier in another CDK stack. Please note that we're NOT creating a new repository here - we're just referring to the existing repository that we've created. Then, we create a lambda with code from the ECR repository image
And finally, we create an event schedule to run every minute and add lambda as a target for the event schedule. You can use any type of event (S3, SQS, SNS, etc..) to trigger lambda. Just for the sake of simplicity, I've created an event schedule.
Wiring the stacks in the CDK App:
You can modify the CDK App (located in the app.py
) as below
app = cdk.App()
AwsCdkEcrStack(app, "RepoStack")
AwsCdkLambdaDockerStack(app, "LambdaStack")
Deploying the first stack:
You can deploy the first stack by executing the below command
cdk deploy RepoStack
This will create an ECR repository by name docker-lambda-py
. Please note that we can't deploy the second stack as we don't have any images in the ECR repository yet. The image for the ECR repository would be pushed by the CI/CD pipeline which we're going to create next
CI/CD pipeline
The steps involved in creating a CI/CD pipeline for the lambda with docker
is pretty simple. You just need to check out the code. Then build, tag, and push the image to ECR
We're going to create CI/CD pipeline using GitHub Actions. If you're new to GitHub Actions - I've written a beginner's guide to GitHub Actions. Please read that first before proceeding.
Create a folder .github/workflows
in the root of your folder(where your lambda code is located) and create file by name deploy.yml
Below is the code for deploy.yml
file. Let me explain what the line in this workflow file does.
name: build & deploy lambda docker image to ECR
on: [push]
jobs:
deploy-lambda-docker-image:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::853185881679:role/github-actions
aws-region: us-east-1
- name: Login to AWS ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build, tag, and push image to Amazon
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: docker-lambda-py
IMAGE_TAG: latest
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "::set-output name=image::$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG"
name
: The name property represents the name of the workflow. You can name anything you want
on
: We want this workflow to be executed on push
so that if any code is pushed to the remote branch - we want this workflow to run
jobs
: We have a single job deploy-lambda-docker-image
and it runs on ubuntu-latest
Instead of storing access key
and secret key
in your repository secrets, you can use OIDC from GitHub to connect to AWS. This approach is more secure and is recommended. And, we've added the required permissions to read the token.
Steps:
checkout code
: As the name implies, we're checking out the code in this step
Login to AWS ECR
: In this step, we're logging into the ECR service
Build, tag and push image to Amazon
: In this step, we're building the image, tagging it, and pushing it to ECR
Deploying the Event Schedule with Lambda
Now, it is time to deploy the other CDK stack that we created earlier.
cdk deploy LambdaStack
This will create an event schedule and trigger the lambda based on the schedule.
Now, the lambda will be triggered every minute and you can see the log statements in the CloudWatch service
Conclusion
Hope you learned on how to use Docker with Lambda using Python
Let me know your thoughts in comments