How to run cron jobs (on a schedule ) using AWS Fargate

How to run cron job using AWS Fargate
How to run cron job using AWS Fargate

In this article, we're going to discuss how to run AWS Fargate task periodically or in a schedule.

Why not use AWS Lambda to run cron job?

You can even use AWS Lambda to run cron job. I've written an article on the same here. But, one of the limitations of AWS Lambda is that the function will get timed-out at 15 minutes. If your cron job needs more time to execute, you may  prefer AWS Fargate.

Just like AWS Lambda, AWS Fargate is also serverless - meaning that you don't need to manage the underlying infrastructure and you'll be billed only for the duration that you fargate task is running.

With respect to pricing, there is a bit of difference. Lambda is billed per millisecond whereas AWS Fargate is billed per second with 1-minute minimum.

Note: We'll be using AWS CDK for creating all necessary AWS resources in this article. It's an open-source software development framework that lets you define cloud infrastructure. AWS CDK supports many languages including TypeScript, Python, C#, Java, and others. You can learn more about AWS CDK from a beginner's guide here. We're going to use TypeScript in this article.

How to Schedule

We're going to use AWS EventBridge rules for scheduling. You can use either rate expression or cron expression

Rate expressions

Rate expressions are a simpler format for representing schedules. The rate expression is a string value that follows the format

rate(value unit)

value should be a positive number whereas the unit can be any of the below values

  • minute
  • minutes
  • hour
  • hours
  • day
  • days

If the value is equal to 1, then the unit must be singular. If the value is greater than 1, the unit must be plural

Examples:

rate(1 minute) :  This expression will create a schedule for every minute

rate(10 minutes) :  This expression will create a schedule for every 10 minutes

rate(1 hour) :  This expression will create a schedule for every hour

rate(10 hours) :  This expression will create a schedule for every 10 hours

rate(1 day) :  This expression will create a schedule for everyday

rate(10 days) :  This expression will create a schedule for every 10 days

Cron expressions

Cron expressions are the most commonly used format in both legacy and modern systems for running background jobs and these cron expressions are being used in UNIX & Linux environments for quite some time.

Cron expression has below fields

minutes: This represents the minutes' value and can be from 0-59

hours: This represents the hours' value and can be from 0-23

day-of-month: This represents the day of the month and can be from 1-31

month: This represents the month of the year, either from 1-12 or from JAN-DEC

day-of-week: This represents the day of the week, either from 1-7 or from SUN-SAT

year: This represents the year

Restriction: We'll not be able to specify the value of  day-of-month and day-of-week in the same expression. If you specify the value (or a * in one of these fields, you must specify ? in the other.

If you're using CDK, there are 2 ways to define cron expressions

  • string format
  • object format

Cron expression in string format:

This is the classical format and this string follows the below format

cron(minutes hours day-of-month month day-of-week year)

We're going to use rate expression for scheduling the cron job. However, you can choose either based on your preference.

Our cron job code

We're going to create a new nodejs project ,the actual code, which needs to be executed on a schedule. We're going to create app.js with bunch of console.log statements simulating some long running job.

console.log('simulating a long running task');
console.log('task running....');
console.log('task still running....');
console.log('task still running....');
console.log('task still running....');
console.log('task finished....');

Dockerfile

The Dockerfile is pretty simple. We're choosing alpine as base image. We just copy the package.json files and install all the dependencies and run the app.js

FROM node:16-alpine

WORKDIR /usr/src/app

COPY package*.json ./
RUN npm install
COPY . .

EXPOSE 80
CMD [ "node", "app.js" ]

Then, you can either push the image manually to ECR repository or build a CI/CD pipeline to push the images. I strongly recommend to use CI/CD pipeline as it is easier to maintain. I've written a detailed guide on how to build a CI/CD pipeline to fargate here . You can read that article if you want to build CI/CD pipeline using Github Actions.

Infrastructure

Fargate task requires VPC and will run inside a VPC. As you may know, a VPC may have different subnets.

Which subnet to use for Fargate task

Your fargate task should be able to communicate with Event Bridge in order for event bridge rule to start your task. As AWS Event Bridge is in public AWS network, you've 3 options for hosting your fargate task

  • In public subnet of your VPC
  • In private subnet with NAT Gateway
  • In private subnet and interact with Event Bridge using  Endpoint

There are pros and cons for each of the approach. I've discussed in detail over various ways of connecting to public AWS service such as S3 from private subnet in a VPC here. But the approach remains the same. Below is the short summary.

Placing your fargate task in public subnet is pretty simple and easy but your fargate task will be accessible from public. This can be a big NO from security standpoint for obvious reasons.

You can have fargate task in private subnet with NAT Gateway. The advantage is that you can still access the internet but you've to pay for NAT gateway - which may gets costlier based on your usage/budget.

You can still place your fargate task in private subnet and use Endpoint to talk to EventBridge. I've discussed about using endpoint with S3 here . But the approach remains the same even if you use it with eventbridge.

In this article, we would be using private subnet with NAT Gateway as it is secure and I can connect to internet from my fargate task and no-one from internet can connect to my task.

Let's create VPC using aws-cdk

const vpc = new ec2.Vpc(this, "FargateNodeJsVpc", {
      maxAzs: 2,
      natGateways: 1,
      subnetConfiguration: [
        {
          cidrMask: 24,
          name: "ingress",
          subnetType: ec2.SubnetType.PUBLIC,
        },
        {
          cidrMask: 24,
          name: "application",
          subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
        },
      ],
    });

Then, create a cluster

const cluster = new ecs.Cluster(this, "Cluster", {
      vpc,
      clusterName: "fargate-node-cluster",
    });

We'll be using AWS ECR repository for storing container images.  We're referring to that ECR repository (which we created earlier) using below code snippet.

 const repo = ecr.Repository.fromRepositoryName(
      this,
      "FargateNodeJsRepo",
      "fargate-node-app"
    );

We're going to use one of the established patterns for running the scheduled  fargate task using AWS CDK. We create scheduled fargate task as shown in below code snippet.

const scheduledFargateTask = new ecsPatterns.ScheduledFargateTask(
      this,
      "ScheduledFargateTask",
      {
        cluster,
        vpc,
        subnetSelection: {
          subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
        },
        scheduledFargateTaskImageOptions: {
          image: ecs.ContainerImage.fromEcrRepository(repo),
          memoryLimitMiB: 512,
          cpu: 256,
        },
        schedule: Schedule.expression("rate(30 minutes)"),
        platformVersion: ecs.FargatePlatformVersion.LATEST,
      }
    );

Testing

You can deploy the stack using cdk deploy command and you can see the logs of your task in Cloudwatch.

Fargate logs in CloudWatch
Fargate logs in CloudWatch

Please let me know your thoughts in comments