Managing Credentials

Managing Credentials

Matillion ETL can integrate with other platform-specific services in your account, provided you have authorised it to do so. This is not compulsory, although some parts of the product (such as S3 data loading) will not work without appropriate authorisation.

You can grant access to other AWS services using an Access Policy.

Note: Each Matillion ETL instance takes a single set of GCP credentials. If you are wanting to set up a new Matillion ETL instance to work with a new GCP Project, it is advised you follow the steps in GCP Account Setup for BigQuery and Storage and Launching Matillion ETL for BigQuery, in that order.

You can grant access to other GCP services using Access Control.
 

 

Adding an AWS Access Policy

There are two ways you can add give access to Matillion ETL.

1. With instance credentials - This is done by specifying an IAM Role for the EC2 instance at launch time. Once in place, you can add and remove permissions from the chosen IAM role and the changes take effect immediately. This gives you fine grained control over exactly which AWS services Matillion ETL is allowed to use. This method is the easiest to manage if you attach the role to the instance while it is being launched. However it is possible to add an IAM to a running instance, using the AWS command line.

2. With existing user defined credentials - These can be defined via the Matillion ETL user interface at any time. The credentials can be created via an IAM User, and Matillion will be able to access what the IAM user has access to. You can simply enter an Access Key ID and a Secret Access Key.

Depending on your company’s security policy, you may choose to use either:

Coarse-grained access control - easier to administer, but typically grants many more privileges than are necessary.

Fine-grained access control - requires you to specify a more complex policy but follows the principle of least privilege. A template for this policy type is attached to this document for the user's convenience.
 

AWS Coarse-grained access control

With this approach, you simply attach the following Managed Policies to the role you are using.

Managed Policy Purpose

AmazonRedshiftReadOnlyAccess

This policy allows Matillion to read metadata about the Redshift clusters, such as endpoint and port number. This makes it easier to set up new projects and environments. If you don't allow this role, you will need to type in all the information manually.

It does not allow Matillion to read data from Redshift (that privilege is granted by you providing a username and password, and by you permitting network connectivity between Matillion and Redshift).

AmazonS3FullAccess

This policy is used to read data to and from S3 buckets. It can be used in almost every load component, since they almost all either read directly from S3 or else use an S3 Staging area.

AmazonSNSFullAccess

This policy is used when creating SNS topics and publishing messages to them. This is the purpose of the SNS Message Component.

AmazonSQSFullAccess

This policy enables Matillion to read from an SQS queue, or to write an SQS message using the SQS Message Component.

CloudWatchFullAccess

This policy is required to publish metrics to Amazon CloudWatch from the CloudWatch Publish component.

AmazonRDSReadOnlyAccess

This policy is used by the RDS Query and RDS Bulk Output components to discover information about the available RDS servers in the region.
Note that without this policy, Matillion may still be able to connect to RDS provided the network path is set up and a username and password have been provided.

AmazonEC2FullAccess

This policy is used when performing automated backups.



Note there is no single Managed Policy which can be used to authorize KMS access in the Manage Passwords dialog. Please use the KMS table in the fine-grained access control section below.


AWS Fine-grained access control

This section lists all the IAM privileges which Matillion ETL can require during normal operation. Privileges are shown using AWS policy action metamodel version "2012-10-17". It does not take into account any privileges required by custom AWS requests made in Bash or Python scripts.

There are no mandatory IAM Actions which Matillion absolutely requires. However the recommended ones listed in the next section are the minimum required for productivity and ease of use.
 

Recommended Actions Description

redshift:DescribeClusters

During first login, allows Matillion to find Redshift clusters. Also used when creating a new environment. Note this privilege is not required in order to use a particular Redshift cluster: it’s just useful to make setup faster and less error prone.

s3:ListAllMyBuckets

Enables every S3 Staging Area drop down list to work. Used in every Orchestration data load component.

s3:ListBucket

Enables operator to view folders inside buckets. Can be used in every Orchestration data load component.

s3:GetObject

Enables Matillion to read S3 files. Used in every Orchestration data load component.

s3:PutObject

Used by every Orchestration data load component to create temporary files in a nominated staging bucket.

The files are used temporarily in order to take advantage of Redshift’s bulk loader and transfer the data as quickly as possible into Redshift. This Action could be restricted to specific staging buckets.

 

s3:DeleteObject

Used for the same purpose as s3:PutObject, to remove temporary S3 files created during data loads.

s3:GetBucketLocation

Uses the subresource location to return a bucket's AWS region.


Automated backups can be enabled from the Project → Manage Backups menu. Backups require the EC2 actions detailed below.
 

EC2 Actions Description

ec2:CreateSnapshot

Used during creation of the EC2 snapshot.

ec2:CreateTags

Used to tag the generated snapshot with a name.

ec2:DescribeInstances

Required to retrieve instance metadata

ecDescribeVolumes

Required to retrieve instance metadata



This section lists the actions required in order to use the KMS option in the Manage Passwords dialog.
Note that in addition to the below, the chosen KMS key must be:
  • In the same Region as Matillion
  • Enabled
 
KMS Action Description
kms:ListAliases Enables Matillion to populate the "Master Key" dropdown by listing all the KMS aliases which are associated with a Key.
kms:Encrypt Enables Matillion to store an encrypted password.
kms:Decrypt Enables Matillion to retrieve and use an encrypted password.
 

In the following tables, we outline actions for RDS, SQS, SNS and Cloudwatch required by Matillion to perform certain functions.
 

RDS Actions Description

rds:DescribeDBInstances

Enables Matillion to list the endpoints of the chosen RDS database type. Note this privilege is not required in order to query data from any RDS instance.


 

SQS Actions Description.

sqs:ListQueues

Enables the operator to select an SQS queue in the main SQS Configuration window.

sqs:GetQueueUrl

Required in order to monitor queues.

sqs:ReceiveMessage

Enables Matillion to receive messages from SQS (e.g. using the SQS Message orchestration component).

sqs:DeleteMessage

Enables Matillion to receive messages from SQS and remove them from the queue.

sqs:SendMessage

Enables Matillion to send messages to SQS.


 

SNS Actions Description

sns:Publish

Required in order to publish messages with the SNS Message component.

sns:ListTopics

Enables the operator to select from a list of existing SNS topics when using the SNS
Message
component.

sns:CreateTopic

If the user manually types in a topic name which doesn’t already exist in that region, Matillion tries to create the new topic, which requires this privilege.


 

CloudWatch Actions Description

cloudwatch:ListMetrics

Required to enable the CloudWatch Publish component to list existing metrics.

cloudwatch:PutMetricData

Required in order to publish metrics to Cloudwatch using Matillion’s CloudWatch Publish component.This Action also includes the authorization to create new Custom Namespaces and new Metrics.

 
 

GCP & BigQuery Roles


When using Matillion ETL for GCP and BigQuery or even when using BigQuery components on other Matillion ETL for other platforms, it is required that the user has access to a GCP account with the following roles.

At the current time, Matillion ETL uses the admin BigQuery role:

roles/bigquery.admin

The admin BigQuery role includes the following roles:

Role Description
roles/bigquery.user Provides permissions to run jobs, including queries, within the project.
roles/bigquery.dataViewer

When applied to a dataset, dataViewer provides permissions to:

  • Read the dataset's metadata and to list tables in the dataset.
  • Read data and metadata from the dataset's tables.

When applied at the project or organization level, this role can also enumerate all datasets in the project. Additional roles, however, are necessary to allow the running of jobs.

roles/bigquery.dataEditor

When applied to a dataset, dataEditor provides permissions to:

  • Read the dataset's metadata and to list tables in the dataset.
  • Create, update, get, and delete the dataset's tables.

When applied at the project or organization level, this role can also create new datasets.

roles/bigquery.dataOwner

When applied to a dataset, dataOwner provides permissions to:

  • Read, update, and delete the dataset.
  • Create, update, get, and delete the dataset's tables.

When applied at the project or organization level, this role can also create new datasets.


Matillion ETL also requires the Storage admin role:

roles/storage.admin

The Storage admin role includes the following roles:

Role Description
roles/storage.objectCreator Allows users to create objects. Does not give permission to delete or overwrite objects.
roles/storage.objectViewer Grants access to view objects and their metadata, excluding ACLs.
roles/storage.objectAdmin Grants full control of objects.
 

Managing and Testing AWS Credentials

When using Matillion ETL the credentials are attached to your Environment definition.

To Manage Credentials:

  • From the Project menu choose Manage Credentials
  • To check if your Instance Credentials are set and working click Test in the Instance Credentials section
  • To add user defined Credentials:
    • Click the + button
    • Add a Name, Access Key ID and Secret Access Key then click Test.

Note: Its OK to get a warning at this point if you are not adding all access policies The output should indicate which policies you have set up successfully.
 



To add credentials to an environment:
  • Expand the Environment panel and choose the environment you wish to modify.
  • In the Credentials section choose either "Instance Credentials" (default) or the name of any user defined credentials you have created (in the example below "Manual Credentials")


This section lists the actions required in order to use the KMS option in the Manage Passwords dialog.
Note that in addition to the below, the chosen KMS key must be:
  • In the same Region as Matillion
  • Enabled
kms:ListAliases - Enables Matillion to populate the “Master Key” dropdown by listing all the KMS aliases which are associated with a Key.
kms:Encrypt - Enables Matillion to store an encrypted password.
kms:Decrypt - Enables Matillion to retrieve and use an encrypted password.

 

Managing and Testing GCP Credentials

Begin by launching your Matillion Instance and opting to create a project when prompted if you do not already have one.

The Create Project menu asks for a Project Group and Name, which can be anything. Default Project and Default Dataset, can be set to your Project ID and Dataset ID as noted earlier. The Environment name can be set to anything.

To enable a connection to the GCP services, you must enter your Service Account key into the Matillion Credentials Manager. To do this, select 'Manage' beside the 'GCP Credentials' field.  If you are not creating a new project, you can access the Credentials Manager through Project Manage → Credentials.

This new window will allow you to manage the credentials Matillion ETL will use to access platform-specific services. Ensure the tab is set to 'GCP' and click the + icon to add a new set of credentials. Name it whatever you wish, then select 'Browse...' and find the JSON file downloaded earlier from the GCP Service Account. This file is all that is required for Matillion to use your GCP project.

Clicking 'Test' should acknowledge a success for BigQuery and GoogleCloudStorage. Clicking 'OK' will take you back to the Create Project screen where you can select 'Test' again to ensure all details are correct. If either test should fail, double-check your entered information to ensure each is correct. If this still fails to produce a successful Test, it is recommended that you contact Matillion Support directly.

With a successful test, clicking 'OK' will create your project and you are now free to use Matillion and supported GCP features.