Create Table Component

Create Table Component

Create or replace a table.

Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Project Select Select the project that the newly-created table will belong to
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
Dataset Select Select the table dataset. For more information on using datasets, see the official documentation.
New Table Name Text The name of the table to create or replace.
Create/Replace Select Create - the default option, creates a new table. This will generate an error if a table with the same name already exists, but will never destroy existing data.
Create if not exists - This will only create a new table if a table of the same name does not already exist. It will not destroy existing data.
If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
Replace - This drops any existing table of the same name, and then creates a new table. This guarantees that after the component succeeds the table matches the schema defined in this component, however any existing data in an existing table will be lost.
Note: Since other database objects may depends upon this table,
drop ... cascade
is used which may actually remove many other database objects.
Table Metadata Column Name The name of the new column
Data Type Text - This type can hold any type of data, subject to a maximum size. More...
Integer - This type is suitable for whole-number types (no decimals). More...
Numeric - This type is suitable for numeric types, without or with decimals. More...
Boolean - This type is suitable for data that is either 'true' or 'false'. More...
Date - This type is suitable for dates without times. More...
DateTime - This type is suitable for dates, times or timestamps (both date and time). More...
Size For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. For Redshift, this is important, since all data is stored using UTF-8 any non-ASCII character will count as 2 or more bytes.
For Numeric types, this is the total number of digits allowed, whether before or after the decimal point.
Decimal Places Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.
Encoding Type The Redshift compression encoding. See the Redshift Documentation for details of the various available encodings.
Note: Although you can manually specify the encoding type here, it is usually better to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component provided you are truncating the table.
Table Metadata Column Name The name of the new column
Data Type For more information on available BQ datatypes please refer to the Google Cloud documentation.
String: This type can hold any type of data, subject to a maximum size.
Integer: This type is suitable for whole-number types (no decimals).
Float: This type is suitable for numeric types, with or without decimals.
Boolean: This type is suitable for data that is either 'true' or 'false'.
Timestamp: This type is suitable for dates, times or timestamps (both date and time).
Mode The field mode. Default is 'NULLABLE'.
NULLABLE: Field allows null values
REQUIRED: Field does not accept null values
REPEATED: Field can accept multiple values
Table Distribution Style Select Even - the default option, distribute rows around the Redshift Cluster evenly.
All - copy rows to all nodes in the Redshift Cluster.
Key - distribute rows around the Redshift cluster according to the value of a key column.
Table-distribution is critical to good performance - see the Amazon Redshift documentation for more information.
Table Distribution Key Select This is only displayed if the Table Distribution Style is set to Key. It is the column used to determine which cluster node the row is stored on.
Table Sort Key Select This is optional, and specifies the columns from the input that should be set as the table's sort-key.
Sort-keys are critical to good performance - see the Amazon Redshift documentation for more information.
Sort Key Options Select Decide whether the sort key is of a compound or interleaved variety - see the Amazon Redshift documentation for more information.
Primary Key Select This is optional, and specifies the column(s) from the input that should be set as the table's primary key
Identity Columns Column Name The name of the column, which must match the name of a column defined in the table metadata, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of type Integer or Numeric with 0 decimal places.
Seed Specifies the starting value.
Step Specifies the increment between values
Backup Select Specify whether the created table is to be included in automated and manual cluster snapshots. The BACKUP NO setting has no effect on automatic replication of data to other nodes within the cluster, so tables with BACKUP NO specified are restored in a node failure. The default is BACKUP YES.

Variable Exports

This component makes the following values available to export into variables:

Source Description
Table Recreated Whether or not the table was (re)created. This is useful when "Create/replace" is set to "Create if not exists" so you can tell if the table needed to be created.

Strategy

Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.

Example

Here we create a table to hold the airport data.

We create the table using the name t_airports, replacing it if necessary. Since this is a small lookup table, it is distributed to all nodes in the cluster (All) and sorted by its primary key, "iata".

A suitable schema is defined to hold the necessary data. 8 Decimal places are allowed for in the latitude/longitude fields.

Example

We create the table using the name t_airports and edit its metadata to suit our needs.

The schema defined by the meta data ensures latitude and longitude columns are floats rather than strings.

A suitable schema is defined to hold the necessary data. 8 Decimal places are allowed for in the latitude/longitude fields.