Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Row count with table iterator

I am loading files from S3 into a temp table on redshift, after which using a table iterator to load data into my fact tables.

I want to do a row count check between data loaded into temp table from s3 with data loaded into all fact tables combined with the iterator.

I am able to export the count from S3 Load component, is it possible to get the total rows loaded by the iterator ?

one potential solution is to create a variable and as part of the iterator job have a python script that sums up the current value and saves to the total, but can it be done in a simper way.

1 Community Answers

Matillion Agent  

Kalyan Arangam —

Hi Asad,

Row counts are exposed by Load components or transformation jobs. Iterator does not fall into either category.

There may be many ways of getting the final count.

Here’s one -

  • declare a Shared, Project-level variable (row_count_final) and initialise it to Zero (0) using a python script component
  • Add a child job to the iterator which contains the S3 load component.
  • In your child job, Declare a job level variable row_count
  • Configure the variable row_count on the export tab of S3 Load component
  • Add a python script component after S3 Load and add teh following lines
row_count_final = row_count_final + row_count
context.updateVariable( 'row_count_final', row_count_final )

Next time you load files using the iterator, variable row_count_final should have the total rows loaded via the S3 Load component through multiple iterations.

A simpler approach may be to capture a row count into a variable before before you load the files and then get a final figure after you finish processing. the difference may give you the expected number.

Hope this helps.


Post Your Community Answer

To add an answer please login