S3 Put Object Component

S3 Put Object

Note: This feature is only available for instances hosted on AWS.

Transfer a file from a remote host onto Amazon S3.

This component can use a number of common network protocols to transfer data up to an S3 bucket. This component copies, not moves, the target file. In all cases, the source data is specified with a URL.

Currently supported protocols are:

  • FTP
  • HDFS
  • HTTP
  • HTTPS
  • SFTP
  • Windows Fileshare
  • S3 Bucket
  • Google Cloud Storage

Properties

Property Setting Description
Name Text The descriptive name for the component.
Input Data Type Choice Choose a connection protocol from the options available.
Set Home Directory as Root Choice Used with (S)FTP. By default, URLs are relative to the users home directory. This option tells Matillion ETL that the given path is from the server root.
Input Data URL Text The URL, including full path and file name, that points to the file to download onto a Bucket. The format of the URL varies considerably, however a default 'template' is offered once you have chosen a connection protocol.
Note:Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. See documentation on URL Safe Characters for more information. This can be avoided by using the Username and Password properties.
Unpack ZIP file Select If this option is chosen, the input is treated as a zip file. Its contents are extracted on the instance and then all content is uploaded to S3. Since a ZIP may contain multiple files, the original names are retained so you cannot choose the target filename.
Output Object Name Text The object (file) name in the target S3 Path. This does not need to match the source filename, indeed for some input protocols there is no source filename.
Domain Text The domain that the host file is located on - this parameter only appears when the data type is "Windows Fileshare".
Username Text This is your URL connection username. It is optional and will only be used if the data source requests it.
Password Text This is your URL connection password. It is optional and will only be used if the data source requests it.
SFTP Key Text This is your SFTP Private Key. It is optional, only relevant for SFTP, and will only be used if the data source requests it.
This must be the complete private key, beginning with "-----BEGIN RSA PRIVATE KEY-----" and conforming to the same structure as an RSA private key.
S3 Path S3 Tree The S3 bucket and folder to copy the file to. A public S3 URL can be entered in the text box, although you must have write access.
Gzip S3 data Choice If set to Yes, the source data will be gzipped before being sent to S3. This compression is only between Matillion ETL and S3, not from the source data to Matillion ETL.
Encryption Select (AWS Only) Decide on how the files are encrypted inside the S3 Bucket.This property is available when using an Existing Amazon S3 Location for Staging.
None: No encryption.
SSE KMS: Encrypt the data according to a key stored on KMS.
SSE S3: Encrypt the data according to a key stored on an S3 bucket.
KMS Key ID Select (AWS Only) The ID of the KMS encryption key you have chosen to use in the 'Encryption' property.

Example URLs

Each protocol, when entered for the first time, will have a sample URL associated with it, detailing the structure of the URL format for that protocol.

Protocol Sample URL
FTP ftp://[username[:password]@]hostname[:port][path]
HDFS hdfs://host:port/filePath
HTTP http://[username[:password]@]hostname[:port][absolute-path]
HTTPS https://[username[:password]@]hostname[:port][absolute-path]
SFTP sftp://[username[:password]@]hostname[:port][path]
Windows Fileshare smb://[[[authdomain;]user@]host[:port][/share[/dirpath][/name]]][?context]
S3 Bucket s3://[bucketname][/path]     More...
Google Cloud Storage Bucket gs://[bucketURL][/path]


Square brackets indicate that that part of the URL is optional. In particular, whether the username and password are entered within the URL is discouraged - it CAN be done, but it poses a potential security risk, and may not work. Entering the username and password in the parameters provided is the preferred style.

Note:Special characters used in the URL field must be URL-safe. See documentation on URL Safe Characters for more information. Where possible, use the username and password fields to avoid special characters interfering with the URL (this also means passwords are not stored as plain text).


Variable Exports

This component makes the following values available to export into variables:

Source Description
Bytes written The number of bytes read from the source and written to S3. If you have selected the Gzip option, this byte count is after compression.


Example

In this example, S3 Put Object is used in an Orchestration job to take a file from a government website and load it into an S3 Bucket. This data from this file can then be loaded into a newly made table.

S3 Put Object requires a link to the file as well as several properties. The Input Data Type is HTTP to match the Input Data URL. Since our data is zipped, we choose to Unpack the ZIP file and since it is a public file, we need no Username or Password. The S3 Path is set to a bucket we own and have write permissions to. Finally, we choose not to Gzip the data as we'll be using it immediately.

When run (to run alone, right click the component and select 'Run Component'), S3 Put Object will unzip and deposit the file given by the URL into the specified S3 Bucket. Note that the name of the copied file will be the name of the file that has been zipped, not the ZIP file itself. In this case, the resulting file is 'Vehicles_2015.csv'. The CSV file is now present on the S3 Bucket and can, among other things, be used as a data source for a table.

A table named 'Vehicles 2015' is created using the Create/Replace Table Component. It is often a good idea to run this component independently by right-clicking the component and selecting 'Run Component', ensuring the table already exists for when the S3 Load component attempts to verify it.

The new table can then be loaded with data from the CSV file we imported using the S3 Load component. The S3 Load component is pointed toward the imported file using the correct S3 Bucket using the 'S3 URL Location' property and then to the imported file using the 'S3 Object Prefix' property. It is important that the component has the Data File Type set to CSV, since we are working with a CSV file. The 'Target Table Name' allows us to choose which table the data will be populating. More information, see documentation on S3 Load and S3 Load Generator.


When run, the S3 Load Component will take data from the file and place it into the table 'Vehicles_2015". Finally, we can use a single Table Input Component in a Transformation job to check the data is correct. The component is simply pointed to our newly-made table 'Vehicles_2015' and we retrieve a sample from the Sample tab. For more information, see the Table Input Component documentation.


Videos