Join Component

Join Component

Join 2 or more input flows into a single output.

Join conditions are defined using expressions which can be calculated from the input columns. Join expressions can be simple or complex.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Main Table Select The name of the input table. The inputs connected to the component are provided to choose from.
Main Table Alias Text Since you can join a table to itself, all inputs require a unique alias, and the Main Table is no exception. Select an alias for the Main Table, for example A.
You can repeat the Main Table name if it is simple.
Joins Join Table The name of an input table.
The inputs connected to the component are provided to choose from. You can add the same table multiple times to perform a self-join, providing that you assign a unique alias each time.
Join Alias Since you can join a table to itself, all inputs require a unique alias, and the Main Table is no exception. Select an alias for the Main Table, for example B.
You can repeat the name of the input if required.
Join Type The Join type specifies how the join is to be performed. Joins are performed in order, top-to-bottom, starting with the Main Table.
Inner – Performs an SQL Inner Join.
Left – Performs an SQL Left Join.
Right – Performs an SQL Right Join.
Full – Performs an SQL Full Outer Join.
Join Expressions List of Expressions A list of expressions specifying how each join is performed. This is the same expression editor used by the Calculator component.
Each expression must be valid SQL and can use the all the built in Redshift Functions
Each expression must be valid SQL and can use the all the built in BigQuery Functions

Each expression must be valid SQL and can use the all the built in Snowflake Functions

There is exactly one expression for each Join, and the result of the expression is evaluated as True or False which indicates whether the two records being compared 'match'. Often this will be a simple 'equality' condition, but it could be more complex, e.g. where a date falls within a start/end date range.
To force a cartesian product (not recommended!) you still need to complete an expression, but you can specify a trivially true condition such as 1=1.
Output Columns Input Column The name of an input column.
There may be duplicate names in input flows, which is why it is important to have unique aliases. The alias name is shown to help distinguish the names
Output Column The output columns must be unique, so you have the opportunity to rename any of the available input columns.
Hint: If you later add another table, the output columns are not re-populated, so you will have to add any additional output columns that have been made available.

Strategy

Generates a select query with multiple inputs in the FROM clause, joined with an ON clause generated from the Join Expressions

Example

This job adds plane information onto the flights data, by matching the Tail Number.

The two input tables are defined, and both connected to the Join component.

The flights table is selected as the Main Table, and its alias is also set to flights.

The plane_info table is selected as the first and only join. Since we only want flights whose tail number is known (data in real life is unfortunately not always perfect!), an inner join is chosen.

The Join Expression is simply matching the tail number from each input flow. Note that it is the alias that is used in the field selection, not the input names.

Finally, all output columns are selected (this is done automatically the first time all the parameters pass validation).

Video