Bash Script Component

Bash Script Component

Run a Bash script.

The script is executed in an external bash process hosted by the Matillion ETL instance. Any errors encountered while running the script will immediately halt it.

Since Matillion ETL is based on the latest Linux, the command line tools are all installed. Furthermore, the credentials stored in your current environment are exported into the shell, so you may (and indeed should!) omit security keys from your scripts when calling the APIs.

All the usual variables are made available in the bash environment and any changes made to such variables will never be visible outside of the current script execution.

Matillion ETL runs as a Tomcat user and care must be taken to ensure this user has sufficient access to resources and does not uninstall any customer-installed Bash libraries.

Cancellation and Timeout

If you cancel a task while a Bash script is running, then it is killed. If the timeout is exceeded the script is also killed. The purpose of the timeout is to ensure scripts will never run forever even if they enter an infinite loop, or are blocking by an external resource.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Script Text The bash script to execute. Output from commands should be brief as it is sent into the task status message.
Timeout Integer Number of seconds to wait for script termination. After Timeout seconds, the script is forcibly terminated.
The default is 360 seconds (5 minutes).

Strategy

Runs the script, redirecting any output it produces into the task message.

Example

In this example we are staging a small amount of data to form part of a monthly report. We want to backup this data at the end of the run so we use a Bash Script component to create a copy of the staged data in the S3 bucket. The job is shown below.

The Bash Script component properties are simple. The component is optionally given a name, then the script is written and a timeout duration is specified.

The script used is shown below. This simply archives the data staged on S3 into a separate 'backups' directory. In this case, the name of the data directory is given by a variable that is set at the beginning of the job (variables may be declared through Project → Manage Environment Variables.

As an embellishment, the rowcount of the data staging component is exported to another variable set up in advanced.

When run, the job will end with a backup of the data being created. Due to the echo command, the name of the file and the rowcount will be printed in the Tasks console.