S3 Load Component
S3 Load Component
Load data into an existing table from objects stored in Amazon Simple Storage Service (Amazon S3).
Many of the configuration settings on this component have sensible defaults, mirroring the defaults provided by Redshift when that option is not specified. Mandatory settings are:
- Target Table Name
- Load Columns
- S3 URL Location
- S3 Object Prefix
- Data File Type
In addition, it is likely you will need to confirm the following settings:
- Compression Method
- Ignore Header Rows
Note: This component requires working AWS Credentials with read access to the bucket containing the source data file(s). The is easily achieved by attaching an IAM role to the instance when launching Matillion ETL for Redshift, however it can also be managed manually by editing an Environment. See the example.
Furthermore, Matillion requires use of a policy that contains the s3:ListBucket action such as the policy provided in the Managing Credentials documentation
|For more information on all the settings in this component, see the Amazon Redshift COPY syntax for more information.|
|Name||Text||The descriptive name for the component.|
|Schema||Select||Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.|
|Target Table||Select||Select an existing table to load data into.|
|Load Columns||Select Multiple||One or more columns that exist in the target table.|
|S3 URL Location||S3 Tree||The URL of the S3 bucket to get the files from. This follows the format s3://bucket-name/location, where location is optional.|
|S3 Object Prefix||Text||All files that begin with this prefix will be included in the load into the target table.|
|Encrypted Data||Select||Select Yes if the S3 data is encrypted with client-side encryption (server side encryption, SSE, is transparent to Amazon Redshift).|
|Encryption Key||Text||Provide the 'master_symmetric_key' as outlined in the Redshift Documentation.|
|IAM Role ARN||Text||Supply the value of a role ARN that is already attached to your Redshift cluster,
and has the necessary permissions to access S3. This is optional, since without this
style of setup, the credentials of the environment (instance credentials or manually
entered access keys) will be used.
See the Redshift documentation for more information about using a Role ARN with Redshift.
|Data Type||Select||Available options are:
Fixed Width - this requires an additional "Fixed Width Spec". See the amazon documentation for details.
JSON - this requires an additional "JSON Format". See the amazon documentation for details.
Avro - this requires an additional "Avro Format". See the amazon documentation for details.
|Delimiter||Text||The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\t".|
|Fixed Width Spec||Text||Loads the data from a file where each column width is a fixed length, rather than separated by a delimiter.|
|CSV Quoter||Text||Specifies the character to be used as the quote character when using the CSV option.|
|JSON Layout||Text||Defaults to 'auto' which should work for the majority of JSON files if the fields match the table field names. Optionally can specify the URL to a JSONPaths file to map the data elements in the JSON source data to the columns in the target table.|
|AVRO Layout||Text||Defaults to 'auto' which should work for the majority of Avro files if the fields match the table field names. Optionally can specify the URL to a JSONPaths file to map the data elements in the Avro source data to the columns in the target table.|
|Region||Select||The Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as "None" if the bucket is in the same region as your Redshift cluster.|
|Compression Method||Select||Whether the input file is compressed in GZIP format, LZOP format, or not compressed at all.|
|Encoding||Select||The encoding the data is in. This defaults to UTF-8.|
|Remove Quotes||Select||Whether to remove any quotes surrounding data values. Note: This option is not available if the chosen data type is CSV.|
|Replace Invalid Characters||Text||If there are any invalid unicode characters in the data, this parameter specified the single character replacement for them. Defaults to '?'.|
|Maximum Errors||Text||The maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values.This value defaults to 0, but the Amazon default is 1000.|
|Date Format||Text||Defaults to 'auto' - this can be used to manually specify a date format.|
|Time Format||Text||Defaults to 'auto' - this can be used to manually specify a time format.|
|Ignore Header Rows||Text||The number of rows at the top of the file to ignore - defaults to 0.|
|Accept Any Date||Select||If this is enabled, invalid dates such as '45-65-2018' are not considered an error, but will be loaded as the null value.|
|Ignore Blank Lines||Select||If this is set, any blank lines in the input file are ignored.|
|Truncate Columns||Select||If this is set, any instance of data in the input file that is too long to fit into the specified target column width will be truncated to fit instead of causing an error.|
|Fill Record||Select||Allows data files to be loaded when contiguous columns are missing at the end of some of the records. The remaining columns are set to null.|
|Trim Blanks||Select||Removes trailing and leading whitespace from the input data.|
|Null As||Text||This option replaces the specified string with null in the output table. Use this is your data has a particular representation of missing data.|
|Empty As Null||Select||If this is set, empty columns in the input file will become NULL.|
|Blanks As Null||Select||If this is set, blank columns in the input file will become NULL.|
|Comp Update||Select||Controls whether compression encodings are automatically applied during a COPY. This is usually a good idea to optimise the compression used when storing the data.|
|Escape||Select||When this option is specified, the backslash character (\) in input data is treated as an escape character.|
|Stat Update||Select||Governs automatic computation and refresh of optimizer statistics at the end of a successful COPY command.|
|Round Decimals||Select||If this option is set, round any decimals to fit into the column when the number of decimal places in the input data is larger than defined for the target column.|
|Manifest||Select||Whether or not the given object prefix is that of a manifest file. See the Redshift documentation for more information.|
|Explicit IDs||Select||Whether or not to load data from the S3 Objects into an IDENTITY column. See the Redshift documentation for more information.|
This example loads the Airports data into a new table.
The data files in this case are comma-separated, GZIP compressed files with a header row.
On completion, the number of rows loaded is shown in the status.
You can check that your instance credentials have S3 access, or define additional credentials, by editing the Environment. See this page for further details.