Create Project (Delta Lake on Databricks)
    • Dark
      Light

    Create Project (Delta Lake on Databricks)

    • Dark
      Light

    Article Summary

    Overview

    A Matillion ETL project is a logical grouping of configuration settings and resources—such as jobs—required to use Matillion ETL. When users first log in to their Matillion ETL instance, they will be required to click Confirm in the Product Improvement Metrics dialog, and then they must create a project if no existing projects are available to select.

    To create a new project, there are two routes:

    1. The first route is found in the Join Project dialog, which will appear automatically upon first loading an instance.
    2. The second route can be accessed by clicking Project, then clicking Switch Project, and then (for both methods) clicking Create Project.

    There are no practical limits to the number of projects you can create. However, only one project is used by the client session at a time, and each project must have a unique name.

    Note

    These instructions assume you have already successfully launched a Matillion ETL instance.


    Creating a Delta Lake on Databricks project on AWS

    The following section describes how to create a project in Matillion ETL for Delta Lake on Databricks (AWS).

    1. Project Details

    Complete the following details:

    • Project Group: Use the drop-down menu to choose an existing project group. Projects should be logically arranged in project groups.
    • Project Name: Enter a suitable name for your new project.
    • Project Description: Describe your project. This is optional.
    • Private Project: Select this to make this new project private. Only users granted access can view and work in this project if private.
    • Include Samples: This is selected by default; clear it if you do not want to include sample jobs in this project.

    2. AWS Connection

    Complete the following details:

    • Environment Name: Enter a name for your new Matillion ETL environment.
    • AWS Credentials: Use the drop-down menu to choose credentials for the AWS cloud platform. Instance Credentials is selected by default. Click Manage to add a new set of credentials. Read Manage Credentials for more information.

    3. Delta Lake Connection

    Complete the following details:

    Note

    Before completing the following steps in the Create Project dialog, you must have an AWS Databricks account. This will enable you to deploy a Databricks workspace.

    • Workspace ID: Enter your existing Databricks workspace ID. This can be found as part of the URL of your Databricks workspace portal. Do not include cloud.databricks.com.
    • Username: Enter the username for your Databricks workspace account. Alternatively, you can enter the word "token". For more information, read How to Generate a New Databricks Token.
    • Password: Enter the password for your Databricks workspace account. Alternatively, provide the Token Value.
    Note

    The following combinations are available in the Username and Password fields:

    • Set Username as the account email. Set Password as the account password.
    • Set Username as the account email. Set Password as the token value.
    • Set Username as "token". Set Password as the token value.

    To test the connection you must ensure all fields in the Delta Lake Connection dialog are populated with information. Click Test when you are ready.

    4. Delta Lake Defaults

    Complete the following details:

    Click Finish to create your project and environment.


    Creating a Delta Lake on Databricks project on Azure

    The following section describes how to create a project in Matillion ETL for Delta Lake on Databricks (Azure).

    1. Project Details

    Complete the following details:

    • Project Group: Use the drop-down menu to choose an existing project group. Projects should be logically arranged in project groups.
    • Project Name: Enter a suitable name for your new project.
    • Project Description: Describe your project. This is optional.
    • Private Project: Select this to make this new project private. Only users granted access can view and work in this project.
    • Include Samples: This is selected by default; clear it if you do not want to include sample jobs in this project if private.

    2. Cloud Connection

    Complete the following details:

    • Environment Name: Enter a name for your new Matillion ETL environment.
    • Azure Credentials: Use the drop-down menu to choose credentials for the Azure cloud platform. Instance Credentials is selected by default. Click Manage to add a new set of credentials. Read Manage Credentials for more information.
    Note

    Ensure your Instance Credentials are correctly configured for the required cloud platform. For example, the Azure Blob Storage Load component relies on credentials with access to Blob Storage.

    3. Delta Lake Connection

    Complete the following details:

    Note

    Before completing the following steps in the Create Project dialog, you will be required to create a Microsoft account to sign in and access the Microsoft Azure portal. This will enable you to deploy a Databricks workspace.

    • Workspace ID: Enter your existing Databricks workspace ID. This can be found as part of the URL of your Azure Databricks Workspace portal. Do not include azuredatabricks.net.
    • Username: The word "token" will appear as default. You do not need to change this. For more information about tokens and setting up this type of authentication for your Databricks workspace account, read Authentication using Azure Databricks personal access tokens.
    • Password: Enter the Token Value for your Databricks workspace account.

    To test the connection you must ensure all fields in the Delta Lake Connection dialog are populated with information. Click Test when you are ready.

    4. Delta Lake Defaults

    Complete the following details:

    Click Finish to create your project and environment.


    Cluster states

    In Matillion ETL, each cluster in the Cluster drop-down menu is assigned a state, with a Databricks equivalent. See the table below for more information:

    Matillion ETLDatabricks
    STOPPEDTerminated
    STARTINGPending
    RUNNINGRunning

    When a cluster is not running, databases won't be retrieved, and the Database drop-down menu won't offer any selections. Attempting to select a database on a cluster that displays a STOPPED state will automatically trigger a cluster to start, but it can take a few minutes for the intended cluster to move from STOPPED to RUNNING, and it will be in the STARTING state during this time.

    Clicking Back and returning to Delta Lake Defaults will refresh and update the state of the clusters, and is a required action to show when a cluster has transitioned from STOPPED to Pending, or RUNNING. Refreshing and updating the state of the clusters will also reload the Database drop-down menu.


    Next steps

    When you first login to Matillion ETL, we recommend you replace your default username and password with your own secure login credentials. For more information about changing these credentials, read User Configuration in the Admin Menu.