JDBC Incremental Load

This article is part of the series on Incremental Load Tools and Shared Jobs

 

Overview

The JDBC Incremental Load component is a tool designed to allow users to easily set up a Shared Job that will incrementally load from JDBC-compliant databases, rather than having to manually create such a job, which would require significantly more expertise.

The JDBC Incremental Load component can be found in the Tools section of the Components panel.

Below is an explanation of what users can expect to find on each section of the load wizard, and what each configuration option entails.

 

Authorisation details

On this first page, users can select the type of database to be accessed, and the authorisation details required to access it.

Property Setting Description
Database Type Select The type of database to be used. Currently supported database types:
 
Connection URL Text The connection URL for the chosen Database Type.
Username Text The username for the database authorisation.
Password Select The password for the database authorisation.

JDBC Incremental Load Wizard
 

Connection Options

A variety of Connection Options can be specified here. These are entirely optional. Details of connection options can be found in the below links:

  1. PostgreSQL
  2. MySQL
  3. Microsoft SQL Server
  4. IBM DB2
  5. Oracle

 

Data Sources

Users can then designate the data that they wish to load by choosing from a selection of Data Sources.

In this example, since we are setting up an incremental load, all Data Sources are available to choose from here, and can be included in the load. Columns for each Data Source are available in the following window:

Data Sources in the left-hand list can be included in the load. Data Sources in the right-hand list have been chosen for the load. Data Sources can be selected and moved from list to list using the arrow buttons.

 

Source List

Tables to be sourced from the database can also be specified by adding each table with the + button. Columns to be taken from each table can then be specified by using the Cogwheel icon by the table's name.

Each column has the option to be flagged as Incremental. Doing so will ensure that only data that exists after this increment (judged by either a future date or higher numeric value) is loaded into the table. This way, the JDBC Incremental Load component can be used to update an existing data set.

For example, you load 1000 rows of personnel data and each record has an 'ID' column that contains a unique numeric and you set that column to Incremental. The next time you perform this same load, the existing table data is queried and the largest ID number is found. Now only source data rows with an ID larger than that number are loaded into the target table, thus updating it.

JDBC Incremental Load Source List

 

Staging and Target Configuration

A variety of configuration options are available to users to customise their load, allowing them to specify where and how data is staged and loaded into tables.

Details on each option are given in the table below.

Property Setting Description
Staging Bucket Select The descriptive name for the component.
Staging Table Prefix Text The prefix given to tables this component will create.
Staging Schema Select The name of the Schema used to stage the data.
Target Table Prefix Text The prefix common to tables that will be created from the load. Default is empty (no prefix).
Target Schema Select The name of the Schema used to create the target tables.
Target Distribution Style Select Even - the default option, distribute rows around the Redshift Cluster evenly.

All - copy rows to all nodes in the Redshift Cluster.

Table-distribution is critical to good performance - see the Amazon Redshift documentation for more information.
Concurrency Select The number of concurrent queries. This value can be changed using the Alter WLM Slots component.
Property Setting Description
Staging Bucket Select The bucket used to stage the data.
Staging Table Prefix Text The prefix given to any tables that this component will create.
Staging Warehouse Select Choose a Snowflake warehouse that will stage the load.
Staging Database Select Choose a database to create each new table in.
Staging schema Select The name of the schema used to stage the data.
Target Table Prefix Text The prefix common to tables that will be created from the load. Default is empty (no prefix).
Target Warehouse Select Choose a Snowflake warehouse to load the data in.
Target database Select Decide which database will hold your new tables.
Target Schema Select The name of the schema used to create the target tables.
Concurrency Select Select between concurrent and sequential queries.

JDBC Incremental Load Wizard Redshift

JDBC Incremental Load Wizard Snowflake
JDBC Incremental Load Targeting Snowflake

 

Completion

Once the tool has been completed, a JDBC Load component will be present on the canvas. This is not identical to the Database Query component, but does utilise the same ideas to perform an incremental load from a JDBC-compliant database.

Clicking on this new JDBC Incremental Load component reveals several of the options given during the setup that can be easily changed at the user's discretion (Warning: This may cause the component to stop working and should never be done during a running workflow). This component can be used in all of the ways that any other component can be.

JDBC Incremental Load Properties Redshift
JDBC Incremental Load Properties Snowflake