Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Updates from Multiple Delta Input Files

I get daily delta files (includes only records with changes) from a production system. I need to update a Redshift table with these deltas. I specifically need to update them in date order since a key might be in multiple files. If the updates aren't in order by date, then older information might overwrite newer information.

If everything runs fine (daily processing) then I should never have multiple files. For whatever reasons, I need to process multiple files. If I do a standard "Load to S3" and "Update/Insert" it appears that I have no way to ensure that the updates are applied in date order, whether by using a timestamp field in the data or the filename (the filenames are prefixed with "YYMMDD".)

What are my options? I'm going down the path of using a file iterator to run each file through individually. Is that my best option?

Thanks.

2 Community Answers

Matillion Agent  

Jason Kane —

Hi Mark,

Thanks for contacting support. A couple of options you could try. If you implement Triggering ETL from an S3 Event via AWS Lambda using a FIFO queue you can process files as they come in. You could also use a python script and boto3 to populate a table with your s3 contents, and then use the table iterator to loop through the created entries in date order.

Thanks,
Jason


Mark Evans —

Thanks. Those are both good suggestions.

Post Your Community Answer

To add an answer please login