Getting Started with Amazon Redshift Spectrum
    • Dark
      Light

    Getting Started with Amazon Redshift Spectrum

    • Dark
      Light

    Article Summary

    Overview

    Amazon Redshift Spectrum allows users to create external tables, which reference data stored in Amazon S3, allowing transformation of large data sets without having to host the data on Redshift. This is done through Amazon Athena, which allows SQL queries to be made directly against data in S3.

    To enable these services in Matillion ETL, users are required to add relevant permissions to their Matillion ETL instance through the AWS console, as well as creating an external schema.



    Create A Role

    Log into your AWS account console and through the main upper-left dropdown menu, select 'IAM'. This will take you to a new page where you can then browse to the 'Roles' section via the list at the left-hand side.

    1. A new Role can be created through the 'Create New Role' button at the top-left.
    2. Select the 'Amazon Redshift' option when asked to select a role type.
    3. Tick boxes for the following roles:
      • AmazonS3FullAccess
      • AmazonAthenaFullAccess
    4. Press 'Next Step' at the bottom right.
    5. Enter a role name and description of your choice.
    6. Press 'Create role' at the bottom right.

    Your new role will be instantly created. Remember its name for the next step.

    Alternately, an existing role can be selected and the same policies can be added through the 'Attach Policy' button. In either case, selecting the role intended for Amazon Redshift Spectrum use will give an overview of the role and allow you to take note of the Role ARN, which is used in the next section.



    Attaching A Role

    1. Log into your AWS account console and through the main upper-left dropdown menu, select 'Redshift'.
    2. From the Redshift page, select Clusters.
    3. Check the box of the cluster you wish to allow Amazon Redshift Spectrum on (the cluster that the Matillion ETL instance will be using for Redshift).
    4. For Actions Click Manage IAM Roles.
    5. On the new page, select the role created in the previous section from the list of available roles.
    6. Click Done to give this role to the cluster.

    Once complete, your Matillion ETL client is ready to be configured for use with Amazon Redshift Spectrum.



    Configuring The Matillion ETL Client

    1. Log into your Matillion ETL instance
    2. Expand the 'Environments' list at the bottom-left of the client.
    3. Right click on the intended environment (one that is associated with the Redshift cluster we previously enabled Amazon Redshift Spectrum policies on).
    4. Select 'Create External Schema' from the right-click menu.
    1. Enter a name for your new external schema. Ensure this name does not already exist as a schema of any kind.
    2. Add the name of your athena data catalog. A new catalog will be created if this name is not found.
    3. Add the Role ARN of the role used to allow Amazon Redshift Spectrum as defined in the previous section.

    Now components within Matillion that make use of external tables (and thus, Amazon Redshift Spectrum) can be used providing they use this external schema. Note that any tables that exist within the linked Athena Catalog will be instantly available for use in Matillion ETL through the external schema that links to them.



    Amazon Redshift Spectrum-Ready Components

    External Tables can be created using the Create External Table component and overwritten using the Rewrite External Table component. These components can write external tables in the following formats:

    • PARQUET
    • RCFILE
    • SEQUENCEFILE
    • TEXTFILE

    Query components in Matillion ETL are ready to use external tables with only a minor amount of configuration. In each of these components, locate the 'Type' property and set it to 'External' from the dropdown list. Then, ensure that the schema specified in that component's 'Schema' property is an external schema. Available tables in the 'Target Table' property will then be external tables available to that schema.


    Most data staging ('Query') components can make use of these external tables.



    IAM Policies for Amazon Redshift Spectrum

    Many fine-grained permissions are available and discussed within this article: IAM Policies for Amazon Redshift Spectrum. Users may be particularly interested in the section Policies to Grant Minimum Permissions.



    Contact Support

    If you require assistance using Amazon Redshift Spectrum, visit our Getting Support page.