Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Load Excel file, only if recently modified?

I am using the "Excel Query" component, to load data from and Excel file I have stored on S3.

When I go into S3 in the AWS console, I see each file has a "Last modified" timestamp.

In Matillion, I would like to run the "Excel Query" component if the Excel file was last modified in the last, say, 24 hours. (If the source Excel file has not been modified recently, then I don't want to bother loading it again, during my scheduled job.)

Is there any way I can do this?

(For example, from Matillion, can I somehow access the "Last modified" timestamp of a file in S3? If so, then I suppose I could store the timestamp value in an environment variable, then use an "If" component to run the "Excel Query" only if the timestamp meets my criteria.)

2 Community Answers

Matillion Agent  

Craig Rouse —

Hi Kevin,
You could used a bash or python script to access the AWS S3 bucket (aws s3 ls s3://myBucket) and retrieve the filenames and timestamps, then loop through the files and write out another file to this (or another) S3 containing filenames that fit the criteria.
Then use this new file in a file iterator over the Excel component, passing in the filename as a variable.
That’s one way, I’m sure there are others.
Thanks,
Craig


Kevin Havice —

Great, yes...here is the Python script I used to get the last modified timestamp from S3...

import boto3
s3 = boto3.client('s3')
file_metadata = s3.head_object(Bucket='welo-bi-data-sources', Key='locations.xlsx')
file_last_modified_utc = file_metadata['LastModified']
print(file_last_modified_utc.strftime("%Y-%m-%d"))

...Thanks!

Post Your Community Answer

To add an answer please login