Create Table Component
Create or replace a Redshift table.
Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.
|Name||Text||The descriptive name for the component.|
|Schema||Select||Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.|
|New Table Name||Text||The name of the table to create or replace.|
|Create/Replace||Select||Create - the default option, creates a new table. This will generate an error if a table with the same name already exists, but will never destroy existing data.
Create if not exists - This will only create a new table if a table of the same name does not already exist. It will not destroy existing data.
If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
Replace - This drops any existing table of the same name, and then creates a new table. This guarantees that after the component succeeds the table matches the schema defined in this component, however any existing data in an existing table will be lost.
Note: Since other database objects may depends upon this table,
drop ... cascadeis used which may actually remove many other database objects.
|Table Metadata||Column Name||The name of the new column|
|Data Type||Text - This type can hold any type of data, subject to a maximum size. More...
Integer - This type is suitable for whole-number types (no decimals). More...
Numeric - This type is suitable for numeric types, with or with decimals. More...
Boolean - This type is suitable for data that is either 'true' or 'false'. More...
Date - This type is suitable for dates without times. More...
DateTime - This type is suitable for dates, times or timestamps (both date and time). More...
|Size||For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. For Redshift, this is important, since all data is stored using UTF-8 any non-ASCII character will count as 2 or more bytes.
For Numeric types, this is the total number of digits allowed, whether before or after the decimal point.
|Decimal Places||Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.|
|Encoding Type||The Redshift compression encoding. See the Redshift Documentation for details of the various available encodings.
Note: Although you can manually specify the encoding type here, it is usually better to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component provided you are truncating the table.
However: Not selecting an Encoding Type or selecting 'None' will not guarantee this data not being compressed. Rather, it means Redshift will choose its own encoding method. Generally, Redshift will opt for LZO or RAW compression, depending on the column and datatype. For more information, see the Redshift documentation.
|Table Distribution Style||Select||Even - the default option, distribute rows around the Redshift Cluster evenly.
All - copy rows to all nodes in the Redshift Cluster.
Key - distribute rows around the Redshift cluster according to the value of a key column.
Table-distribution is critical to good performance - see the Amazon Redshift documentation for more information.
|Table Distribution Key||Select||This is only displayed if the Table Distribution Style is set to Key. It is the column used to determine which cluster node the row is stored on.|
|Table Sort Key||Select||This is optional, and specifies the columns from the input that should be set as the table's sort-key.
Sort-keys are critical to good performance - see the Amazon Redshift documentation for more information.
|Sort Key Options||Select||Decide whether the sort key is of a compound or interleaved variety - see the Amazon Redshift documentation for more information.|
|Primary Key||Select||This is optional, and specifies the column(s) from the input that should be set as the table's primary key|
|Identity Columns||Column Name||The name of the column, which must match the name of a column defined in the table metadata, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of type Integer or Numeric with 0 decimal places.|
|Seed||Specifies the starting value.|
|Step||Specifies the increment between values|
|Backup||Select||Specify whether the created table is to be included in automated and manual cluster snapshots. The BACKUP NO setting has no effect on automatic replication of data to other nodes within the cluster, so tables with BACKUP NO specified are restored in a node failure. The default is BACKUP YES.|
This component makes the following values available to export into variables:
|Table Recreated||Whether or not the table was (re)created. This is useful when "Create/replace" is set to "Create if not exists" so you can tell if the table needed to be created.|
StrategyGenerates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.
Here we create a table to hold the airport data.
We create the table using the name t_airports, replacing it if necessary. Since this is a small lookup table, it is distributed to all nodes in the cluster (All) and sorted by its primary key, "iata".
A suitable schema is defined to hold the necessary data. 8 Decimal places are allowed for in the latitude/longitude fields.