Distinct Component

Distinct Component

Pass on only the distinct (or unique) input records to the next component.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Columns Selection Only these selected columns are passed to the next component. Duplicate records are removed, leaving only distinct values.

Note: On Redshift, since records may be located on many nodes, determining uniqueness can be expensive.

Strategy

Generates a select distinct query.

Example

In this example, we have two tables of data that contain project tasks from past months. We want to bring these tables together using a Unite component so that we have just 1 complete table of data. Unfortunately, the dates from the past data tables have some overlap and this means we end up with duplicate rows when uniting the two tables. In order to solve this problem, we use a Distinct component to filter out any duplicate rows. The Transformation job is shown below.

Below we note the number of rows in each input table with the 3rd row count (inside the Unite component) being the sum of these counts. A quick way of checking the success of the Distinct component will be a reduction in row count as duplicates are removed.

In the Distinct component properties, all columns are added so that they all appear in the output.

When run, this component will take in these columns from the input, keep the unique rows and output them. The resulting sample and row count is shown below.