Variables

Overview

Note: Matillion ETL supports Environment Variables (+Automatic Variables), Job Variables and Grid Variables. For detailed information on each type, please visit the below links. For more general information on variables in Matillion ETL, read on.


Variables are name-value pairs stored within each environment. Variables can be used in all sorts of parameters and expressions to allow the user to pass and centralise environment specific configuration. They are also easily accessible through Python and Bash scripts where they are substituted in at run time.

Warning: Variables become first class variables in Python and Bash scripts and great care should be taken to avoid naming them in a manner that clashes with key words in either language. We recommend using a prefix (for example, v_) to ensure no such conflicts occur.

Variables can be referenced with the syntax:

${<variable name>}

When a job is run, variables are resolved by first consulting job variables, then environment variables, then system variables. Thus, if a job variable and environment variable of the same name exists, the job variable will be preferentially used.

All alpha-numberic characters as well as underscores are valid for variable names. Note, however, that a variable name cannot begin with a digit.

${my_table_82}        Is a valid variable name.

${82_my_table}        Is not a valid variable name.

In addition, variables can be javascript expressions. Such as:

${new Date().getFullYear()}
 


Behaviour (Copied and Shared)

Throughout this article, variables may be referred to as 'Copied' and 'Shared'. This refers to their 'branch behaviour' inside a job. A 'branch' in this context means a divergence of connectors within a job, giving a branched structure. Iterator components are also examples of branching, albeit with a different aesthetic.
The branch behaviour describes how a variable behaves when updated during a job run.



Copied variables can be updated within one branch of a job without updating variables of the same name in other branches. A branch using a copied variable will begin by taking its default value, even if this variable has been updated in another, parallel branch. Conversely, Shared variables are updated in a job-wide fashion. As such, regardless of branches, if a Shared variable is updated in one branch, all other branches will use that updated value.

Note: In previous versions of Matillion ETL, 'Copied' and 'Shared' variables have been referred to as 'Local' and 'Global', respectively, and can be thought of synonymously when reading the documentation.

 

Types

​The value of any given variable must be one of the following types. This type defines how a variable can be used and is important to consider when passing the variable to components and functions. In the Matillion ETL client, types are often referred to by the symbols shown beside each data type below.
 

Type Description
 Text This type can hold any type of data, subject to a maximum size. More...
 Numeric This type is suitable for numeric types, with or without decimals. More...
Real and Double Precision types are available in some features.
 DateTime This type is suitable for dates, times and timestamps. More...
Type Description
 Text This type can hold any type of data, subject to a maximum size. More...
 Numeric This type is suitable for numeric types, with or with decimals. More...
 DateTime This type is suitable for dates and times. More...
Type Description
 Text: This type can hold any type of data, subject to a maximum size. More...
 Numeric: This type is suitable for whole-number types (no decimals). More...
 DATETIME: This type is suitable for dates, times or timestamps (both date and time). More...

For DateTime variables, Matillion ETL also comes with some useful proprietary methods. See Date and Time Methods for more information...

 

Example 1

As a most basic example, we can use a defined variable to perform a simple calculation in the following Transformation job:

The job consists of a 'Generate Sequence' component linked to a 'Calculator' component. We can use the 'Generate Sequence' component to create a column of numbers that are passed to the Calculator. First, select the 'Generate Sequence' component and set its Starting, Increment and End values to 1, 1 and 10, respectively (below, left). This will create a column of data that counts from 1 to 10 (below, right).

Selecting the Calculator component, we can enter the Calculation Editor, Adding a new calculation, we can multiple values in the 'sequence' row (output by the 'Generate Sequence' component) and multiply that value by a number. In this case, we multiply each value in the 'sequence' column by 4 using the following calculation:

"sequence" * 4

The Calculator's output can be viewed by selecting the 'Sample' tab while the Calculator component is selected and clicking 'Retrieve Sample'. Clearly, the sequence column is being multiplied by the number specified in the calculation.

However, we could have used a variable in this calculation, instead. First a variable must be declared in the 'Project' drop down menu and clicking 'Manage Environment Variables'. A new variable can be declared here and given a type, value and scope. Since the variable for this example will be used in a simple calculation, the 'Numeric' type is appropriate. The Scope is of little consequence for this job and can be set to Shared. Finally, the default value can be set to any number - we choose 6 in this example.

Finally, the Calculator component must be instructed to use this variable in its calculation. Reentering the Calculation Editor, the constant multiplier can be replaced with the newly-declared variable using the following calculation: 

"sequence" * ${example_var}

Checking the sample output confirms that the Calculator component has correctly used the variable's default value in the calculation.


 

Example 2

In this example we have two environments, 'Test' and 'Live'. A common use case would be to work on a more limited data set in 'Test' but have a full data set in 'Live'. We have a starting_year variable that will define how much data we will use in the 'Test' and 'Live' environments when applied in a filter.

First notice we have configured two environments.

Variables must be predefined and we must have a default value for each environment. It is not necessarily true that two environments have different default values. However, in this example, we wish to set our variable starting_year to an earlier value for use with the 'Live' environment. In the 'Test' environment, we require less data and can thus start collecting it from a much later year.

Now that we have defined it, our variable can normally be used in all expression editors and free text fields. In this example we use a Filter Component to limit our data set by the variable.

The filter is set up to filter any table rows that have value greater than ${starting_year) in the 'year' column. Due to how we defined default values for starting_year, if we run this job in the 'Test' environment, only flights from 2016 onward will remain. In the 'Live' environment, flights from 2006 onward will remain.

The SQL generated by this component and its settings can be viewed in the 'SQL' tab. The example below shows the SQL for the 'Live' environment.


Instead of relying on variables with default values, you can also use a Javascript expression. Javascript expressions do not require defining any variables in the environment. An example for the 'Live' environment is given below.

Which would generate the following SQL, given that the current year generated from the Javascript Date() function is 2016.