Create Table Component

Create Table Component

Create or replace a table.

Warning: Depending on the chosen settings, this component is destructive. Take care when running this component as it may remove existing data.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Project Select Select the project that the newly-created table will belong to.
Database Select Select the database that the newly-created table will belong to.
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see Schema Support.
Dataset Select Select the table dataset. For more information on using datasets, see the official documentation.
New Table Name Text The name of the table to create or replace.
Note:This is case-sensitive by default as Matillion ETL uses quoted identifiers. To change this behaviour, please consult documentation on case sensitivity in Snowflake.
Create/Replace Select Create - the default option, creates a new table. This will generate an error if a table with the same name already exists, but will never destroy existing data.
Create if not exists - This will only create a new table if a table of the same name does not already exist. It will not destroy existing data.
If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
Replace - This drops any existing table of the same name, and then creates a new table. This guarantees that after the component succeeds the table matches the schema defined in this component, however any existing data in an existing table will be lost.
Note: Since other database objects may depends upon this table,
drop ... cascade
is used which may actually remove many other database objects.
Table Metadata Column Name The name of the new column
Data Type Text - This type can hold any type of data, subject to a maximum size. More...
Integer - This type is suitable for whole-number types (no decimals). More...
Numeric - This type is suitable for numeric types, without or with decimals. More...
Boolean - This type is suitable for data that is either 'true' or 'false'. More...
Date - This type is suitable for dates without times. More...
DateTime - This type is suitable for dates, times or timestamps (both date and time). More...
Size For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. For Redshift, since all data is stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
For Numeric types, this is the total number of digits allowed, whether before or after the decimal point.
Decimal Places Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.
Encoding Type The Redshift compression encoding. See the Redshift Documentation for details of the various available encodings.
Note: Although you can manually specify the encoding type here, it is usually better to use automatic compression analysis to select the optimal compression. When loading into empty tables, this is performed by the S3 Load (unless you disable COMPUPDATE). It can also be performed by the Table Output component provided you are truncating the table.
Table Metadata Column Name The name of the new column
Data Type For more information on available BQ datatypes please refer to the Google Cloud documentation.
String: This type can hold any type of data, subject to a maximum size.
Integer: This type is suitable for whole-number types (no decimals).
Float: This type is suitable for numeric types, with or without decimals.
Boolean: This type is suitable for data that is either 'true' or 'false'.
Date: A formatted date object without time. See the Official GCP documentation
Time: A formatted time object without date.See the Official GCP documentation
DateTime: A formatted timestamp containing both date and time that is easily readable by the user.See the Official GCP documentation
Timestamp: This type is a timestamp left unformatted (exists as Unix/Epoch Time).
Mode The field mode. Default is 'NULLABLE'.
NULLABLE: Field allows null values
REQUIRED: Field does not accept null values
REPEATED: Field can accept multiple values
Columns Column Name The name of the new column
Data Type VARCHAR:- This type can hold any type of data, subject to a maximum size. More...
NUMBER:- This type is suitable for whole-number types (no decimals). More...
FLOAT:- This type is suitable for numeric types, without or with decimals. More...
BOOLEAN:- This type is suitable for data that is either 'true' or 'false'. More...
DATE:- This type is suitable for dates without times. More...
TIMESTAMP:- This type is suitable for timestamps. More... TIME:This type is suitable for times only. More... VARIANT: A flexible type that can be used for any purpose.
Size For Text types, this is the maximum length. This is a limit on the number of bytes, not characters. Generally speaking, in all data stored using UTF-8, any non-ASCII character will count as 2 or more bytes.
Precision Relevant only for Numeric, it is the maximum number of digits that may appear to the right of the decimal point.
Default Value The default value under this column for any row.
Not Null True if this column does not accept null values.
Primary Key Assign this column to be a primary key, declaring values of this column can be used to uniqely identify rows. Thus, primary keys are unique by default. You can only have one primary key per table.
Unique Mark this column as unique, declaring all values in it are unique with respect to one another. You may have multiple unique columns per table.
Table Distribution Style Select Even - the default option, distribute rows around the Redshift Cluster evenly.
All - copy rows to all nodes in the Redshift Cluster.
Key - distribute rows around the Redshift cluster according to the value of a key column.
Table-distribution is critical to good performance - see the Amazon Redshift documentation for more information.
Table Distribution Key Select This is only displayed if the Table Distribution Style is set to Key. It is the column used to determine which cluster node the row is stored on.
Table Sort Key Select This is optional, and specifies the columns from the input that should be set as the table's sort-key.
Sort-keys are critical to good performance - see the Amazon Redshift documentation for more information.
Sort Key Options Select Decide whether the sort key is of a compound or interleaved variety - see the Amazon Redshift documentation for more information.
Primary Key Select This is optional, and specifies the column(s) from the input that should be set as the table's primary key
Identity Columns Column Name The name of the column, which must match the name of a column defined in the table metadata, which will be set as an identity column. An identity column contains automatically generated values. The specified column must be of type Integer or Numeric with 0 decimal places.
Seed Specifies the starting value.
Step Specifies the increment between values
Backup Select Specify whether the created table is to be included in automated and manual cluster snapshots. The BACKUP NO setting has no effect on automatic replication of data to other nodes within the cluster, so tables with BACKUP NO specified are restored in a node failure. The default is BACKUP YES.
Clustering Keys Select Specify clustering key to use. This allows for more efficient sorting and scanning of very large tables. More...
Data Retention Time In Days Text Number of days data is retained after deletion such that it may be restored.
Comment Text Attaches a comment to the table. More...
Partitioning Select Choose whether to create a date-partitioned table or not. Selecting 'Day' will create the table as a partition table which will load data into separate date-partitions and can be queried with the '_PARTITIONTIME' pseudo-column. This allows large data sets to be split into multiple parts with each part corresponding to a single day of data. For more information on date-partitioned tables, consult the GCP documentation.
Enabling partitioning on a table will reveal additional options in the 'Table Input' Transformation component that allows the partition column to be included in the input.

Variable Exports

This component makes the following values available to export into variables:

Source Description
Table Recreated Whether or not the table was (re)created. This is useful when "Create/replace" is set to "Create if not exists" so you can tell if the table needed to be created.

Strategy

Generates any required DDL to manage an existing table of the same name, followed by a CREATE TABLE statement.

Example

Here we create a table to hold the airport data.


We create the table using the name t_airports, replacing it if necessary. Since this is a small lookup table, it is distributed to all nodes in the cluster (All) and sorted by its primary key, "iata".

A suitable schema is defined to hold the necessary data. 8 Decimal places are allowed for in the latitude/longitude fields.

Example

We create the table using the name t_airports and edit its metadata to suit our needs.


The schema defined by the meta data ensures latitude and longitude columns are floats rather than strings.

A suitable schema is defined to hold the necessary data. 8 Decimal places are allowed for in the latitude/longitude fields.