Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Preparing a file for S3 load

Hi !

We have a Table called ACCOUNT.
We generally receive a file in this format = ACCOUNT_UIP_20180201_161512.TXT
What we do is - We first manually rename a file like this = ACCOUNT_UIP.txt
and then place it under a folder in S3 bucket.
(Note= ACCOUNT_UIP is the name we want.
20180201 is the date. 161512 is the hour min sec.)
We do this type of exercise for all the files like product, customer, Geography etc.
In our S3 object prefix for respective loading jobs we have written like ACCOUNT_UIP.txt, CUSTOMER_UIP.txt, GEOGRAPHY_UIP.txt.
Now we want to to automate this process.
That means when files are available in S3 bucket, some process should should first rename those files into a required format so that Matillion could load them.
Could you please help me on this ?
What could be the possible approaches to this requirement ?


4 Community Answers

Matillion Agent  

Laura Malins —

Hi Nachiket

The S3 load components support object prefixes so you could just configure Matillion to load any files starting with ACCOUNT_UIP_

If you really want to rename them this is easy to do using a bash script component using the AWS cli.

You command would be something like

aws s3 mv s3://my-bucket-name/ACCOUNT_UIP_20180201_161512.TXT s3://my-bucket-name/ACCOUNT_UIP.txt

You can make the first file name a variable value if required.

Thanks
Laura


Nachiket Mehendale —

Thanks Laura. Yes renaming would work. Is there a way where I can simply write -

aws s3 mv s3://my-bucket-name/ACCOUNT_UIP*.txt s3://my-bucket-name/ACCOUNT_UIP.txt

And also do you think Manifest object functionality can work to solve this issue as an other option ?


Matillion Agent  

Laura Malins —

Hi Nachiket

The Manifest would work, but it is an unnecessary step as Matillion will automatically support object prefixes using the S3 Load component so you can just load everything starting with ACCOUNT_UIP_ I’d recommend this rather than renaming.

You can use a wildcard in the aws cli but you’ll need the —recursive flag. My worry with that approach is that you can potentially lose files if you have 2 files in your bucket say, ACCOUT_UIP_201802013_161512.TXT and ACCOUT_UIP_201802012_161512.TXT both would be renamed and you would lose the first one renamed.

Thanks
Laura


Nachiket Mehendale —

Thanks Laura. Before moving files to S3 bucket I am thinking of taking a backup of all in the folder and placing them in a newly created folder. Plus we are thinking of adding a rule like at a time no two files can be moved to S3 bucket else the resulting deadlock (overriding) will not allow to load all the files ( ACCOUNT_UIP_20180101_1010101.txt and ACCOUNT_UIP_20180101_121212.txt)

Post Your Community Answer

To add an answer please login