File Iterator Component
Loop over matching files in a remote file system.
This component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and pathnames are mapped into environment variables that can be referenced from the attached component(s).
If you need to iterate more than one component, put them into a separate orchestration or transformation job and use a Run Transformation or Run Orchestration component attached to the iterator. In this way, you can run an entire ETL flow multiple times, once for each file found.
|Name||Text||The descriptive name for the component.|
|Input Data Type||Select||The remote file system to search. Choose one of FTP, SFTP, HDFS, Windows Fileshare or Amazon S3 Bucket.|
|Input Data URL||Select||A URL pointing to a folder. The component will offer a template URL once Input Data Type has been selected. You should point this at the folder which contains the files you wish to iterate over - subfolders are considered if you set "Recursive" to Yes.|
|Username||Text||This is your URL connection username. It is optional and will only be used if the data source requests it.|
|Password||Text||This is your URL connection password. It is optional and will only be used if the data source requests it.|
|SFTP Key||Text||This is your SFTP Private Key. It is optional, only relevant for SFTP, and will only be used if the data source requests it.
This must be the complete private key, in OpenSSH PEM format.
|Set Home Directory as Root||Choice||Used with (S)FTP. By default, URLs are relative to the users home directory. Setting this to No tells Matillion ETL for Redshift that the given path is from the server root.|
|Recursive||Select||Yes: consider files in subdirectories when searching for files.
No: only search for files within the folder identified by the Input Data URL.
|Ignore Hidden||Select||Yes: ignore 'hidden' files, even if they otherwise match the Filter Regex.
No: include 'hidden' files.
|Filter Regex||Text||The regular expression used to test against each candidate file's full path. If you want ALL files, specify .*|
|Variables||Target Variable||An existing environment variable to hold the given value of the Path Selection.|
|Path Section||For each matched file, the target variable can be populated with the Base Path, the Subfolder (useful when recursing) or the Filename. You can export any or all of these into variables used by each iteration.
See the example for the difference between these.
|Max Recursion Depth||Integer||Recursively searching subdirectories can be expensive. Use this setting to control how many levels deep to search into the directory structure.|
|Max Iterations||Integer||The total number of iterations to perform, even if more matched files would be found.|
|Concurrency||Select||Sequential: Iterations are done in sequence, waiting for each to complete before starting the next. This is the default.
Concurrent: Iterations are run concurrently. This requires all "Variables to Iterate" to be defined as localvariables, so that each iteration gets its own copy of the variable isolated from the same variable being used by other concurrent executions.
|Break on Failure||Select||No: Attempt to run the attached component for each iteration, regardless of success or failure.
Yes: If the attached component does not run successfully, fail immediately.
Note:If a failure occurs during any iteration, the failure link is followed. This parameter controls whether it is followed immediately or after all iterations have been attempted.
Note:This is only available in Sequential mode. When running with concurrency, all iterations will be attempted.
This component makes the following values available to export into variables:
|Iteration Count||The number of iterations performed (files found), up to the Max Iterations parameter.|
This example uploads files to S3 from a remote SFTP location used by a company to upload daily sales data. Each branch uploads to a different folder, or directory, each day to provide sales information.
We'll use a file-iterator in conjunction with S3 Put to upload all of these to S3.
We can setup the iterator to point to /data/ (this is the Base Path), ask it to recurse any found folders/directories (this is the Subfolder), and match files like "sales_.*.dat" (this is the Filename).
The variable mapping is setup to provide both the subfolder and the filename into environment variables:
Those variables can then be referenced from the attached S3 Put component both in its Input Data URL and Output Object Name:
At runtime, any matching files are uploaded to S3.