Aggregate Component

Aggregate Component

This component works by grouping together multiple input rows into a single output row. Input columns can be added to the groupings, or have an aggregation applied to them.

Default output names are chosen by combining the source column name and the aggregation type. If they are not appropriate, consider using a Rename Component immediately after this component to choose more appropriate names.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Groupings Columns One or more source columns that form the groupings.
The output flow will contain one row for each distinct combinations of values within the grouping columns.
Aggregations Source Column The input column for the summary function (the summary function is the aggregation type).
Aggregation Type You can add the same source column multiple times each with a different aggregation type. The summary is calculated for each combination of the source column values.
Min: Find the Minimum value of the source column.
Max: Find the Maximum value of the source column.
Count: Count the records whose source column is not null.
Count Distinct: Count the distinct values of the source column.
Approximate Count: This is variant of Count Distinct that may have better performance in cases where an exact count is not required.
See the Redshift Documentation for more details.

Sum: Sum the values in the source column. This is only sensible for numeric source columns.
Average: Average the values in the source column. This is only sensible for numeric source columns.
Standard Deviation (Sample): Calculates the sample standard deviation of the source column. This is only sensible for numeric source columns.
Standard Deviation (Population): Calculates the population standard deviation of the source column. This is only sensible for numeric source columns.
Variance (Sample): Calculates the sample variance of the source column. This is only sensible for numeric source columns.
Variance (Population): Calculates the population variance of the source column. This is only sensible for numeric source columns.
For more information on the Aggregation Types, see the Redshift Documentation.
For more information on Aggregation functions in BigQuery, see the official documentation.

Strategy

Generates summary (aggregate) functions along with a group-by clause.

Example

This job counts the number of flights and totals the distance travelled for each tail number in the input data.

Groupings are set to Tail Number (tailnum).

Aggregations are set to total the distance, and count the records whose flight number is not null.

The sample data confirms that the data is being summarised as expected.