S3 Unload Component

S3 Unload Component

Note: This feature is only available for instances hosted on AWS.

Creates files on a specified S3 bucket, and loads them with data from a table or view.

By default, your data will be unloaded in parallel, creating separate files for each slice on your cluster.


By default, your data will be unloaded in parallel.

Note:This component is similar in effect to the 'Text Output' component. Since S3 Unload unloads data in parallel directly from Redshift to S3, it tends to be faster than using Text Output. However, S3 Unload sacrifices some of the added functionality that comes from Text Output pulling the data through the Matillion ETL instance (such as adding column headers to each file).

Note:This component is similar in effect to the 'Text Output' component. Since S3 Unload unloads data in parallel directly from Snowflake to S3, it tends to be faster than using Text Output. However, S3 Unload sacrifices some of the added functionality that comes from Text Output pulling the data through the Matillion ETL instance (such as adding column headers to each file).

Properties

Property Setting Description
Name Text The descriptive name for the component.
Stage Select Choose a predefined stage for your data. These stages must be created from your Snowflake account console. Otherwise, 'Custom' can be chosen for the staging to be based on the component's properties.
S3 Object Prefix Text/Select The name of the file, without file extension for data to be unloaded into.
File Prefix Text Filename prefix for unloaded data to be named on the S3 bucket. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel and will use the maximum number of nodes available at the time.
Encryption Select (AWS Only) Decide on how the files are encrypted inside the S3 Bucket.This property is available when using an Existing Amazon S3 Location for Staging.
None: No encryption.
SSE KMS: Encrypt the data according to a key stored on KMS.
SSE S3: Encrypt the data according to a key stored on an S3 bucket.
KMS Key ID Select (AWS Only) The ID of the KMS encryption key you have chosen to use in the 'Encryption' property.
Master Key Select The ID of the server side encryption key you have chosen to use in the 'Encryption' property.
Warehouse Select Choose a Snowflake warehouse that will run the load.
Database Select Choose a database to create the new table in.
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Note: An external schema is required if the 'Type' property is set to 'External'.
Table name Text The table or view to unload to S3.
Format Select Choose from preset file formats available in your Snowflake database. Additional file formats can be created using the Create File Format component. Selecting the [Custom] file format will use S3 Unload's properties to define the file format.
File Type Select Choose whether you would like Matillion ETL to unload the data in a CSV or JSON format.
S3 URL Location Text The URL of the S3 bucket to load the data into.
S3 Object Prefix Text Create data files in S3 beginning with this prefix. The format of the output is
<prefix><slice-number>_part_<file-number>
Where slice-number is the number of the slice in your cluster and file number (files larger than 6.2GB) will be split.
IAM Role ARN Text Supply the value of a role ARN that is already attached to your Redshift cluster, and has the necessary permissions to access S3. This is optional, since without this style of setup, the credentials of the environment (instance credentials or manually entered access keys) will be used.
See the Redshift documentation for more information about using a Role ARN with Redshift.
Generate Manifest Select Whether or not to generate a manifest file detailing the files that were added.
Data File Type Select Can either be Delimited or Fixed Width.
Compression Select Whether the input file is compressed in GZIP format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD or not compressed at all.
Delimiter Text The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\t".
Record Delimiter Text The delimiter to be used that separates records (rows) in the file. Defaults to newline. \n can also signify a newline. \r can signify a carriage return.
Field Delimiter Text The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\t".
Fixed Width Spec Text Loads the data from a file where each column width is a fixed length, rather than separated by a delimiter. Each column is described by a name and length, seperated by a colon. Each described column is then separated by a comma.
e.g. We have four columns; name, id, age, state. These columns have the respective lengths; 12,8,2,2.
The written description to convert this data into a table using fixed-width columns would then be:
name:12,id:8,age:2,state:2

Note that the columns can have any plaintext name. For more information on fixed width inputs, please consult the AWS documentation.
Date Format Text Defaults to 'auto' - this can be used to manually specify a date format.
Time Format Text Defaults to 'auto' - this can be used to manually specify a time format.
Timestamp Format Text Defaults to 'auto' - this can be used to manually specify a timestamp format.
Compress Data Select Whether or not the resultant files are to be compressed.
NULL As Text This option replaces the specified string with null in the output table. Use this is your data has a particular representation of missing data.
Escape Select Whether or not to insert backslashes to escape special characters. This is often a good idea if you intend to re-load the data back into a table later, since the COPY also supports this option.
Escape Unenclosed Field Text Single character string used as the escape character for unenclosed field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). If a character is specified in the 'Escape' field, it will override this field.
Field Optionally Enclosed Text A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
Null If Text This option replaces the specified string with null in the output table. Use this is your data has a particular representation of missing data.
Allow Overwrites Select If the target file already exists, overwrite data instead of generating an error.
Single File Select If set to TRUE, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file.
Parallel Select If set, the unload will work in parallel, creating multiple files (one for each slice of the cluster). Disabling parallel will result in a slower unload but a single, complete file.
Add quotes Select If set, quotation marks are added to the data.
Max File Size Text The maximum size (in MB) of each file generated, per thread. Default is 16 MB and AWS has a 6.2GB file limit for Unload operations. Files that exceed the stated maximum will be split into multiple size-abiding parts. The maximum size (in bytes) of each file generated, per thread. Default is 16000000 bytes (16 MB) and Snowflake has a 6.2GB file limit for copy-into-location operations. Files that exceed the stated maximum will be split into multiple size-abiding parts.
Include Headers Select If set to TRUE, write column names as headers at the top of the unloaded files.
Encryption Select (AWS Only) Decide on how the files are encrypted inside the S3 Bucket.This property is available when using an Existing Amazon S3 Location for Staging.
None: No encryption.
SSE KMS: Encrypt the data according to a key stored on KMS.
SSE S3: Encrypt the data according to a key stored on an S3 bucket.
KMS Key ID Select (AWS Only) The ID of the KMS encryption key you have chosen to use in the 'Encryption' property.
Master Key Select The ID of the server side encryption key you have chosen to use in the 'Encryption' property.

Example

In this example, we have a table 't_carriers' and we wish to unload the data to an S3 bucket for long-term storage. To this end, we use the S3 Unload component in a job as shown below.

In the S3 Unload component, an S3 URL is set, and an object prefix is set to 'carriers_unload'. This means that files will be created on the S3 bucket with the common name of 'carriers_unload' followed by the slice number (if 'Parallel' is enabled, which it is) and part number of the file. For example: 'carriers_unload_3_part_2'.

Note that although the table is unloaded, it still exists and must be cleaned up seperately.