JDBC Incremental Load
The JDBC Incremental Load component is a tool designed to allow users to easily set up a Shared Job that will incrementally load from JDBC-compliant databases, rather than having to manually create such a job, which would require significantly more expertise.
The JDBC Incremental Load component can be found in the Tools section of the Components panel.
Below is an explanation of what users can expect to find on each section of the load wizard, and what each configuration option entails.
On this first page, users can select the type of database to be accessed, and the authorisation details required to access it.
|Database Type||Select||The type of database to be used. Currently supported database types:
|Connection URL||Text||The connection URL for the chosen Database Type.|
|Username||Text||The username for the database authorisation.|
|Password||Select||The password for the database authorisation.|
A variety of Connection Options can be specified here. These are entirely optional. Details of connection options can be found in the below links:
Users can then designate the data that they wish to load by choosing from a selection of Data Sources.
In this example, since we are setting up an incremental load, all Data Sources are available to choose from here, and can be included in the load. Columns for each Data Source are available in the following window:
Data Sources in the left-hand list can be included in the load. Data Sources in the right-hand list have been chosen for the load. Data Sources can be selected and moved from list to list using the arrow buttons.
Tables to be sourced from the database can also be specified by adding each table with the + button. Columns to be taken from each table can then be specified by using the Cogwheel icon by the table's name.
Each column has the option to be flagged as Incremental. Doing so will ensure that only data that exists after this increment (judged by either a future date or higher numeric value) is loaded into the table. This way, the JDBC Incremental Load component can be used to update an existing data set.
For example, you load 1000 rows of personnel data and each record has an 'ID' column that contains a unique numeric and you set that column to Incremental. The next time you perform this same load, the existing table data is queried and the largest ID number is found. Now only source data rows with an ID larger than that number are loaded into the target table, thus updating it.
Staging and Target Configuration
A variety of configuration options are available to users to customise their load, allowing them to specify where and how data is staged and loaded into tables.
Details on each option are given in the table below.
|Staging Bucket||Select||The descriptive name for the component.|
|Staging Table Prefix||Text||The prefix given to tables this component will create.|
|Staging Schema||Select||The name of the Schema used to stage the data.|
|Target Table Prefix||Text||The prefix common to tables that will be created from the load.|
|Target Schema||Select||The name of the Schema used to create the target tables.|
|Target Distribution Style||Select||Even - the default option, distribute rows around the Redshift Cluster evenly.
All - copy rows to all nodes in the Redshift Cluster.
Table-distribution is critical to good performance - see the Amazon Redshift documentation for more information.
|Concurrency||Select||The number of concurrent queries. This value can be changed using the Alter WLM Slots component.|
|Staging Bucket||Select||The bucket used to stage the data.|
|Staging Table Prefix||Text||The prefix given to any tables that this component will create.|
|Staging Warehouse||Select||Choose a Snowflake warehouse that will stage the load.|
|Staging Database||Select||Choose a database to create each new table in.|
|Staging schema||Select||The name of the schema used to stage the data.|
|Target Table Prefix||Text||The prefix common to tables that will be created from the load. Default is empty (no prefix).|
|Target Warehouse||Select||Choose a Snowflake warehouse to load the data in.|
|Target database||Select||Decide which database will hold your new tables.|
|Target Schema||Select||The name of the schema used to create the target tables.|
|Concurrency||Select||Select between concurrent and sequential queries.|
Once the tool has been completed, a JDBC Load component will be present on the canvas. This is not identical to the Database Query component, but does utilise the same ideas to perform an incremental load from a JDBC-compliant database.
Clicking on this new JDBC Incremental Load component reveals several of the options given during the setup that can be easily changed at the user's discretion (Warning: This may cause the component to stop working and should never be done during a running workflow). This component can be used in all of the ways that any other component can be.