Authentication With API Profiles
This document builds on an earlier article "Writing API Profiles", and describes how to add authentication to an API Profile that you have created.
To recap, a Matillion API Profile contains RSD “files”, each of which converts the hierarchical XML or JSON data from an API endpoint into a rectangular format that can be used by target platforms. Most real API's require some form of authentication, for example in the form of an API key, or a username and password.
This document demonstrates various techniques for adding authentication to an RSD file.
Matillion’s own API (JSON)
The first authentication example looks at Matillion’s own internal REST API. This uses JSON and requires HTTP Basic authentication, which is a very common design pattern.
You will find a built-in API Profile named “Matillion API”, containing two RSD files: Run History Details.rsd and Run History Summary.rsd.
You can run these with an API Query component in a simple Orchestration job. Choose the Matillion API profile, and Run History Summary as the Data Source, with just the id column for now.
When the component is fully configured with an S3 Staging Area and a Target Table, if you run it the job will fail with an authentication error.
The API call failed because it requires authentication, and none was provided.
If you look at the built-in RSD files in the Matillion API Profile, there are no authentication statements at all. Instead of editing the RSD files, we’ll take advantage of a shortcut that Matillion offers.
HTTP Basic authentication using Matillion Connection Options
The Connection Options of an API Query component allow you to use HTTP Basic authentication with no modification to an RSD file. The parameters are:
- authscheme: set to Basic
- user: a Matillion user (which must be a member of the API group)
Note that the chosen user must be a member of the API group, otherwise the job will fail again with the same error. Please refer to “Enabling API for users” in the Matillion ETL API documentation.
Behind the scenes, the API Query component’s authscheme Connection Option works by setting a special HTTP header on your behalf. In the next example, we’ll do this manually instead by making a change to the RSD file.
Dotmailer provide a JSON-based REST API which allows you to query many of the entities, such as campaigns and contacts. The API is documented here.
We'll use the "get all campaigns" web service, which returns all known campaigns in batches of up to 500. Refer to “Writing API Profiles” to set up a new Matillion API Profile named DotMailer, with an RSD file named Campaigns. The RepeatElement is just the document root “/”, and the URI is set (from the Dotmailer API documentation) to https://r1-api.dotmailer.com/v2/campaigns. Please refer to the attached script DotmailerCampaigns.rsd.txt for the exact syntax.
The Dotmailer API does require authentication. There is a useful dummy account for testing purposes which has username "email@example.com" and password "demo".
You’ll see from the Dotmailer API documentation that it can use HTTP Basic authentication. This involves encoding the username and password, and using the resulting text string in an HTTP request header named “authorization”. We could have Matillion do this on our behalf, as shown in the previous example in this article. But this time we’ll do it manually by making the RSD file set this specific HTTP request header.
Encoding the credentials.
The first step is to encode the username and password in the correct way. On Linux (for example, on your Matillion instance) you can use the base64 command from the bash shell, like this:
echo -n "firstname.lastname@example.org:demo" | base64
On Windows you can do the encoding using a PowerShell script, like this:
$b = [System.Text.Encoding]::UTF8.GetBytes("email@example.com:demo") [System.Convert]::ToBase64String($b)
Setting the HTTP request header
Now the encoded value can be used in the API query, by setting the "authorization" HTTP header, with these two lines:
<rsb:set attr="Header:Name#" value="authorization" />
<rsb:set attr="Header:Value#" value="Basic ZGVtb0BhcGljb25uZWN0b3IuY29tOmRlbW8=" />
Verifying the output
You should now be able to use the Campaigns RSD file in a Matillion API Query component, without setting any Connection Options. The dummy account returns five records.
You can write a simple orchestration job to test this, or can import the attached job APIQueryDotmailer.json into your Matillion environment.
Custom HTTP headers
Building on the above technique, you can use pairs of <rsb:set> declarations to set any HTTP headers that your API needs. For example, to set two headers, you would specify:
<rsb:set attr="Header:Name#" value="CustomHeader1" />
<rsb:set attr="Header:Value#" value="Value1" />
<rsb:set attr="Header:Name#" value="CustomHeader2" />
<rsb:set attr="Header:Value#" value="Value2" />
Setting custom headers will enable you to authenticate with most REST style API’s. Some require a separate username and password, and others just require a “token” or “key” which you set in a specifically named header.
However to work with SOAP API’s you’ll need to do something slightly different, which we’ll look at in the next example.
A SOAP API example (XML)
This next example shows how Matillion can invoke a SOAP API, with authentication credentials embedded into the SOAP request. We’ll use the UK National Rail’s Live Departure Boards Web Service, which is documented here. The endpoint in this case is the GetDepartureBoard service, which can be made to list the trains departing from one named station to another, within a defined time window.
Refer again to “Writing API Profiles” to set up a new Matillion API Profile named OpenLDBWS, with an RSD file named Departure Board. The URI is set to https://lite.realtime.nationalrail.co.uk/OpenLDBWS/ldb9.asmx. The RepeatElement locates the part of the SOAP response which lists the matching train services, which according to the API documentation is located at /Envelope/Body/GetDepartureBoardResponse/GetStationBoardResult/trainServices/service.
Note that the xPath value here is not namespace-aware, unlike most other XML processors. So you don’t need to include the namespace prefixes in the xPath expression of the RepeatElement.
Please refer to the attached script Departure Board.rsd.txt for the exact syntax.
RSD script adaptations for SOAP
Making a SOAP call does require a few other adaptations to the basic RSD script. These include:
- Line 17 - explicitly setting the HTTP Content-Type header
- Lines 19 and 20 - set a custom HTTP header which is required by the API
- Line 24 - requesting an HTTP POST operation (the default is GET)
- Line 45 - using the xmlproviderGet operation, since SOAP is an XML-based protocol
- Line 27 to 43 - defining the SOAP request payload as required by the API.
Note that the SOAP request payload includes the query parameters (lines 35-40) and the authentication token (at line 30).
You’ll need your own, valid token in order to use this API. Tokens are free, but you do first need to register, by providing a few details at http://realtime.nationalrail.co.uk/OpenLDBWSRegistration.
After you have registered, they will send the token to you via email. It’s a 16-byte hex string in the format of a SQL Server identity column. You should paste the value into the XML element at line 30 of the RSD file.
Verifying the output
You should now be able to use the Departure Board RSD file in a Matillion API Query component, without setting any Connection Options. It’s a live feed, so the results will vary according to the time of day, depending on how many trains are due to run between the chosen stations in the next few hours.
You can write a simple orchestration job to test this, or can import the attached job APIQuerySOAP.json into your Matillion environment.