Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

AWS Billing Data

Has anyone worked out a solution using Matillion to import their AWS billing data into Redshift? Here is AWS's documentation on how to get billing and usage data into Redshift:

http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-reports-costusage-upload.html

The problem that we see is there is no way from Matillion to execute the .SQL file that they produce in order to create the proper schema in Redshift. We have already seen differences month-to-month in the schema, so it's important to be able create the monthly tables in Redshift according to the Create statement they produce.

Any ideas/suggestions are welcome.

8 Community Answers

Matillion Agent  

Ian Funnell —

Hi Ed,

You should be able to execute the .SQL file from an SQL Script component. The generated SQL contains a CREATE TABLE statement and a COPY (i.e. a bulk load).

Other alternatives might be to use the Matillion S3 Load Generator on the file, or to access the files through Athena/Spectrum. But if the schema keeps changing like you’ve observed then it will be best to use their generated .SQL file.

Best regards,
Ian


Ed Sesek —

Thanks, I missed the SQL Script component. That should do the trick.


Ed Sesek —

Sorry, I'm still missing something here. The SQL that I need to execute is in an S3 object. The SQL Script component seems to only execute a SQL statement typed into the component text field. How would I execute the SQL stored in the S3 object?


Matillion Agent  

Harpreet Singh —

Hi Ed.
You can load the SQL to a table in the Datawarehouse and then use table iterator component to read the SQL from that table and move it to Job variable. Once the SQL is available in the Variable you can use that in the SQL Script. I know its little involved process but it would get the job done.

Best
Harpreet


Matillion Agent  

Kalyan Arangam —

Hi Ed,

As I understand, you have 2-assets

  1. .SQL file in S3 with some script to create the target table
  2. ,CSV file with data

I would recommend using a Python Script component. You may read the script from the file in S3 and use the cursor object to execute the script against redshift. There is an example of using the Cursor object in the python Script component. Please note, you need the JYTHON interpretor (default) to use the cursor object.

Then use an S3 Load component to load the .CSV file to the newly created table.

Hope that helps.

Best
Kalyan


Ed Sesek —

Thanks, folks. Let us look into both of those ideas and see what we can come up with.


Ed Sesek —

An option on the S3 Load Component that would be really useful here is to only load the specified columns from a CSV file where the column name in the header row of the CSV matches the column name in the target table. In the current situation, I have no control over the contents of the CSV file. I do have control of the target table schema. I would like to load only the columns from the the CSV that match the columns in my target table, ignore any other columns in the CSV.


Matillion Agent  

Kalyan Arangam —

Hi Ed,

S3 Load is just a front for the COPY command in Redshift. So we are limited by options presented by Redshift.

https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-column-mapping.html

The only way I can think of is to Load the complete file and transfer specific columns to your target table via a transformation job. However, you may already be doing this!

Alternatively, you will need some process to process the file externally and extract the said columns to a specified file and then load into redshift. I cant think of any tools off my head but it must be possible. Are there any issues with your current process?

Best
Kalyan

Post Your Community Answer

To add an answer please login