File Iterator Component

File Iterator Component

Loop over matching files in a remote file system.

This component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and pathnames are mapped into environment variables that can be referenced from the attached component(s).

To attach the iterator to another component, use the blue output connector and link to the desired component. To detach, right click on the attached component and select 'Disconnect from Iterator'.

If you need to iterate more than one component, put them into a separate orchestration or transformation job and use a Run Transformation or Run Orchestration component attached to the iterator. In this way, you can run an entire ETL flow multiple times, once for each file found.

Properties

Property Setting Description
Name Text The descriptive name for the component.
Input Data Type Select The remote file system to search. Choose one of FTP, SFTP, HDFS, Windows Fileshare or Amazon S3 Bucket.
Input Data URL Select The URL, including full path and file name, that points to the files to download into the staging area. The format of the URL varies considerably, however a default 'template' is offered once you have chosen a connection protocol.
Note:Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. See documentation on URL Safe Characters for more information. This can be avoided by using the Username and Password properties.
Domain Select The URL, including full path and file name, that points to the files to download into the staging area. The format of the URL varies considerably, however a default 'template' is offered once you have chosen a connection protocol.
Note:Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. See documentation on URL Safe Characters for more information. This can be avoided by using the Username and Password properties.
Username Text This is your URL connection username. It is optional and will only be used if the data source requests it.
Password Text This is your URL connection password. It is optional and will only be used if the data source requests it.
SFTP Key Text This is your SFTP Private Key. It is optional, only relevant for SFTP, and will only be used if the data source requests it.
This must be the complete private key, in OpenSSH PEM format.
Set Home Directory as Root Choice Used with (S)FTP. By default, URLs are relative to the users home directory. Setting this to No tells Matillion ETL that the given path is from the server root.
Recursive Select Yes: consider files in subdirectories when searching for files.
No: only search for files within the folder identified by the Input Data URL.
Ignore Hidden Select Yes: ignore 'hidden' files, even if they otherwise match the Filter Regex.
No: include 'hidden' files.
Filter Regex Text The regular expression used to test against each candidate file's full path. If you want ALL files, specify .*
Variables Target Variable An existing environment variable to hold the given value of the Path Selection.
Path Section For each matched file, the target variable can be populated with the Base Path, the Subfolder (useful when recursing) or the Filename. You can export any or all of these into variables used by each iteration.
See the example for the difference between these.
Max Recursion Depth Integer Recursively searching subdirectories can be expensive. Use this setting to control how many levels deep to search into the directory structure. (Property only appears if Recursive is 'Yes')
Max Iterations Integer The total number of iterations to perform, even if more matched files would be found.
Concurrency Select Sequential: Iterations are done in sequence, waiting for each to complete before starting the next. This is the default.
Concurrent: Iterations are run concurrently. This requires all "Variables to Iterate" to be defined as localvariables, so that each iteration gets its own copy of the variable isolated from the same variable being used by other concurrent executions.
Break on Failure Select No: Attempt to run the attached component for each iteration, regardless of success or failure.
Yes: If the attached component does not run successfully, fail immediately.
Note:If a failure occurs during any iteration, the failure link is followed. This parameter controls whether it is followed immediately or after all iterations have been attempted. 
Note:This is only available in Sequential mode. When running with concurrency, all iterations will be attempted.

Variable Exports

This component makes the following values available to export into variables:

Source Description
Iteration Attempted The number of iterations that this component attempts to reach (Max Iterations parameter).
Iteration Generated The number of iterations that have been initiated. Iterators terminate after failure so this number will be the successful iterations plus any potential failure.
Iteration Successful The number of iterations successfully performed. This is the max iteration number, minus failures and any unattempted iterations (since the component terminates after failure).

Example

This example uploads files to S3 from a remote SFTP location used by a company to upload daily sales data. Each branch uploads to a different folder, or directory, each day to provide sales information.

We'll use a file-iterator in conjunction with S3 Put to upload all of these to S3.

We can setup the iterator to point to /data/ (this is the Base Path), ask it to recurse any found folders/directories (this is the Subfolder), and match files like "sales_.*.dat" (this is the Filename).

The variable mapping is setup to provide both the subfolder and the filename into environment variables:

Those variables can then be referenced from the attached S3 Put component both in its Input Data URL and Output Object Name:

At runtime, any matching files are uploaded to S3.

Example

This example uploads files to Google Storage from a remote SFTP location used by a company to upload daily sales data. Each branch uploads to a different folder, or directory, each day to provide sales information.

We'll use a file-iterator in conjunction with Google Storage Put to upload all of these to a Google Storage bucket.

We can setup the iterator to point to /data/ (this is the Base Path), ask it to recurse any found folders/directories (this is the Subfolder), and match files like "sales_.*.dat" (this is the Filename).

The variable mapping is setup to provide both the subfolder and the filename into environment variables:

Those variables can then be referenced from the attached Google Storage Put component both in its Input Data URL and Output Object Name:

At runtime, any matching files are uploaded to Google Storage.

Video