Using incron to automatically copy data to S3

Overview

This is not strictly related to Matillion ETL however it is a common task to get data easily into S3 from a premise or system.  This approach is especially used in a micro-batching scenario.

The below instructions should be enough to get you started with incron however this is a powerful tool that is capable of much more than the scope of this article can cover.

 

Installation

To begin, incron must be installed. This deamon will set and watch a directory for changes. To get this on Amazon linux run

sudo yum install incron

Once installed delete the /etc/incron.allow file if it exists. This is used to whitelist who can use the tool.

sudo rm /etc/incron.allow

Alternatively whitelist your user in the file.

Setting up aws-cli

Assuming that the aws-cli is already installed run as it is on an Amazon Linux machine. If not use the instructions here. Run:
aws configure

Follow the on screen instructions. Once installed create the script that will copy your file. Run

incrontab -e

To edit the inotify configuration, add a line as so:

<path to local watched directory> IN_CLOSE_WRITE /usr/local/bin/aws s3 cp $@/$# s3://<bucket name><prefix path>

Example:

/home/ed/lntest IN_CLOSE_WRITE /usr/local/bin/aws s3 cp $@/$# s3://matillion/delme/

Now to test add a file to the watched directory and it will appear in S3. If anything is wrong it is written to /var/log/syslog

Full in incron documentation is worth a read and its available here.