How to Unzip files from S3 Bucket using AWS Lambda

How to Unzip files from S3 Bucket using AWS Lambda

This article will discuss how to unzip files automatically to the target bucket when you upload a zip file in the source bucket.

Unzipping files to target S3 bucket
Unzipping files to target the S3 bucket
💡
TLDR: The idea is to use decompress npm package to unzip the files and write them to /tmp directory and then copy all the files to the target bucket

I've written a similar article on how to untar file here

Before discussing how to unzip the files using AWS Lambda- let us create a zip file from a directory. When you normally zip a file using the zip utility on mac, it'll create another directory __MACOSX which would contain metadata. For this tutorial, we would not need that as we just want to zip only the files in source directory and not any metadata.

You can use the below command to zip only the source files without any metadata directory.

zip -r topDirectory.zip topDirectory  -x '**/.*' -x '**/__MACOSX'

Please note that the directory topDirectory contains the text files across multiple sub-directories.

Infrastructure

We're going to use AWS CDK for creating the necessary infrastructure. It's an open-source software development framework that lets you define cloud infrastructure. AWS CDK supports many languages including TypeScript, Python, C#, Java, and others. You can learn more about AWS CDK from a beginner's guide here. We're going to use TypeScript in this article.

At high level, we just need 3 resources

  • Source S3 bucket
  • Lambda function to unzip the files
  • Target S3 bucket

Creation of buckets

I'm using the below code to create a source bucket( ZipBucket ) and target bucket ( UnzipBucket ) - random numbers are appended to the bucket names just to make the bucket names unique.

const zipBucket = new s3.Bucket(this, "ZipBucket", {
      bucketName: "zip-bucket-07071005",
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    const unzipBucket = new s3.Bucket(this, "UnzipBucket", {
      bucketName: "unzip-bucket-07071005",
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

Creation of Lambda Function

We would be using Node18 runtime environment and we'll be giving necessary permissions to this lambda function. This lambda function may need read access to the source s3 bucket and write permission to the target s3 bucket so that the lambda function can unzip files and write to the target bucket.

And, we would need to trigger the lambda whenever a zip file is uploaded to the source s3 bucket

const unzipBucket = new s3.Bucket(this, "UnzipBucket", {
      bucketName: "unzip-bucket-07071005",
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true,
    });

    const nodeJsFunctionProps: NodejsFunctionProps = {
      runtime: Runtime.NODEJS_18_X,
      timeout: Duration.minutes(3), // Default is 3 seconds
      memorySize: 256,
    };

    const unzipFn = new NodejsFunction(this, "unzipFn", {
      entry: path.join(__dirname, "../src/lambdas", "unzip.ts"),
      ...nodeJsFunctionProps,
      functionName: "unzipFn",
      environment: {
        TARGET_BUCKET: unzipBucket.bucketName,
      },
    });



    unzipBucket.grantWrite(unzipFn);
    zipBucket.grantRead(unzipFn);

    unzipFn.addEventSource(
      new S3EventSource(zipBucket, {
        events: [s3.EventType.OBJECT_CREATED],
      })
    );

Lambda function source code

The actual unzipping happens in this lambda function. We're going to use decompress npm package to unzip the files. This decompress package requires a file system path to write to. So, we're going to give a path  /tmp/unzipped  for storing the unzipped files. Then, finally we're going to upload the files to the target s3 bucket from the local file system.

import { S3Event } from "aws-lambda";
import decompress from "decompress";
import * as fs from "fs";
import * as path from "path";
import {
  S3Client,
  GetObjectCommand,
  GetObjectCommandInput,
  PutObjectCommand,
  PutObjectCommandInput,
} from "@aws-sdk/client-s3";
import { Readable } from "stream";

export const handler = async (
  event: S3Event,
  context: any = {}
): Promise<any> => {
  const region = process.env.AWS_REGION || "us-east-1";
  const targetBucketName = process.env.TARGET_BUCKET || "";
  const client = new S3Client({ region });

  for (const record of event.Records) {
    try {
      const bucketName = record?.s3?.bucket?.name || "";
      const objectKey = record?.s3?.object?.key || "";

      //get object from s3
      const params: GetObjectCommandInput = {
        Bucket: bucketName,
        Key: objectKey,
      };
      const localFilePath = path.join("/tmp", objectKey);
      const localFileStream = fs.createWriteStream(localFilePath);
      const command = new GetObjectCommand(params);
      const response = await client.send(command);
      let readableStream: Readable = response.Body as Readable;
      readableStream.pipe(localFileStream);

      //unzip file
      const files = await decompress(localFilePath, "/tmp/unzipped");

      for (const file of files) {
        const filePath = file.path.replace("/tmp/unzipped", "");
        console.log("targetBucketName:", targetBucketName);
        console.log("filePath", filePath);
        console.log("file.data", file.data);

        const putObjectParams: PutObjectCommandInput = {
          Bucket: targetBucketName,
          Key: filePath,
          Body: file.data,
        };

        const response = await client.send(
          new PutObjectCommand(putObjectParams)
        );
        console.log("response", response);
      }
    } catch (err: any) {
      console.log(err);
    }
  }
};

Please let me know your thoughts in the comments.