File Iterator

Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

File Iterator

The File Iterator component lets users loop over matching files in a remote file system.

The component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and path names are mapped into environment variables, which can then be referenced from the attached component(s).

To attach the iterator to another component, use the blue output connector and link to the desired component. To detach, right-click on the attached component and click Disconnect from Iterator.

If you need to iterate more than one component, put them into a separate orchestration job or transformation job and use a Run Transformation or Run Orchestration component attached to the iterator. In this way, you can run an entire ETL flow multiple times, once for each row of variable values.

All iterator components are limited to a maximum 5000 iterations.

Properties

Property	Setting	Description
Name	String	A human-readable name for the component.
Input Data Type	Select	Select the remote file system to search. Available data types include: Azure Blob Storage, Cloud Storage, FTP, HDFS, S3, SFTP, and Windows Fileshare.
Input Data URL	String / Select	Select or input the URL, including the full path and file name, that will point to the files to download to the selected staging area. Once you have selected the connection's Input Data Type, Matillion ETL will provide a template URL string. Note: Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. For more information, please refer to our Safe Characters documentation.
Domain	String	Input your connection domain.
SFTP Key	String	Input your SFTP private key. This property will only be used if the data source requests it. This property is only available when the Input Data Type is set to SFTP.
Username	String	Input your URL connection username. This property will only be used if the data source requests it.
Password	String	Input your URL connection password. This property will only be used if the data source requests it. Users can store passwords in the component itself, or use the secure Password Manager feature (recommended).
Set Home Directory as Root	Select	No: Designates that the URL path is from the server root. Yes: Designates that the URL path is relative to the user's home directory. Default setting is Yes. This property is only available when the Input Data Type is set to either FTP or SFTP.
Recursive	Select	No: Only search for files within the folder identified by the Input Data Url. Yes: Consider files in subdirectories when searching for files. This property is only available when the Input Data Type is set to FTP, SFTP, or Windows Fileshare.
Max Recursion Depth	Integer	Set the maximum recursion depth into subdirectories. This property is only available when Recursive is set to Yes.
Ignore Hidden	Select	No: Include "hidden" files. Yes: Ignore "hidden" files, even if they otherwise match the Filter Regex. Default setting is Yes.
Max Iterations	Integer	Set the total number of iterations to perform. As mentioned earlier, the maximum cannot exceed 5000.
Filter Regex	String	The java-standard regular expression used to test against each candidate file's full path. If you want ALL files, specify .*
Concurrency	Select	Concurrent: Iterations are run concurrently. This requires all "Variables to Iterate" to be defined as copied variables, so that each iteration gets its own copy of the variable isolated from the same variable being used by other concurrent executions. Sequential: Iterations are done in sequence, waiting for each to complete before starting the next. This is the default setting. Note: The maximum concurrency is limited by the number of available threads (2x the number of processors on your cloud instance).
Variables	Variable	An existing environment variable to hold the given value of the Path Selection.
Variables	File Attribute	For each matched file, the target variable can be populated with the Base Path, the Subfolder (useful when recursing), the Filename, or the date of when the file was Last Modified. You can export any or all of these into variables used by each iteration. For the Last Modified attribute, the date is formatted as ISO8601, with a UTC indicator. For example, 2021-01-04T10:45:15.123Z Users may experience a lag in how their data warehousing platform updates the last modified date, for example between when Matillion ETL interacts with the file versus the actual last modified date. This behaviour is a limitation to the platform and is subject to that platform's metadata. See the example section in the full documentation for the difference between these.
Break on Failure	Select	No: Attempt to run the attached component for each iteration, regardless of success or failure. This is the default setting. Yes: If the attached component does not run successfully, fail immediately. Note i: If a failure occurs during any iteration, the failure link is followed. This parameter controls whether it is followed immediately or after all iterations have been attempted. Note ii: This property is only available when Concurrency is set to Sequential. When set to Concurrent, all iterations will be attempted.
Record Values In Task History	Select	Choose whether to record iteration values in the Matillion ETL Task History. The default setting is Yes.
Stop On Condition	Select	Select Yes to stop the iteration based on a condition specified in the Condition property. The default setting is No. For this property to be available, set Concurrency to Sequential.
Mode	Select	Select the method of creating the condition. Simple: A no-code Condition UI will open, where users must specify an Input Variable, Qualifier, Comparator, and Value using drop-down menus and text fields. This is the default setting. Advanced: An editor will open, where users must write the condition manually using SQL.
Condition (Simple mode)	Input Variable	An input variable to form a condition around.
	Qualifier	Is: Compares the input variable to the value using the comparator. Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
	Comparator	Select the comparator. Available comparison operators include "Less than", "Less than or equal to", "Equal to", "Greater than or equal to", "Greater than", and "Blank".
	Value	Specify the value to be compared.
Condition (Advanced mode)	Text Editor	Manually write the condition in the editor. This editor accepts conditions written in JavaScript.
Combine Conditions	Select	Use the defined conditions in combination with one another according to either And or Or. This property is only available when Mode is set to Simple.

Variable Exports

This component makes the following values available to export into variables:

Source	Description
Iteration Attempted	The number of iterations that this component attempts to reach (Max Iterations parameter).
Iteration Generated	The number of iterations that have been initiated. Iterators terminate after failure, so this number will be the successful iterations plus any potential failures.
Iteration Successful	The number of iterations successfully performed. This is the max iteration number, minus failures and any unattempted iterations (since the component terminates after failure).

Example

This example shows how specific files can be transferred from an S3 bucket to a Google Cloud Storage bucket. This will be done by using the File Iterator component in conjunction with the Data Transfer component.

The File Iterator component is set up to point to an Input Data URL (this is the Base Path). The File Iterator recurses any found folders/directories (this is the Subfolder), and matches files like "sales_.*.gz" (this is the Filename).

In this example, the variable mapping is set up to provide both the "subfolder" and the "filename" into environment variables.

Those variables can then be referenced from the attached Data Transfer component both in the Input Data URL and Target Object Name.

At runtime, any matching files are uploaded to the Google Cloud Storage bucket.

Video

What's Next

Fixed Iterator

Table of contents

File Iterator
Properties
Variable Exports
Example
Video