Writing API Profiles


Important: New API drivers have changed the way that input parameters are set up in the rsd. With the latest version of Matillion, users must set up input parameters as inputs rather than dummy columns. See below for an example of the change.



Important: New API drivers return just the date on 'Date' data types as opposed to the datetime that older versions returned.
 

Overview

Matillion’s API Query component provides a way to query a JSON- or XML-based API, and load the resulting data into a table.

It’s a powerful and flexible component, which can deal with APIs that contain nested data structures, and those which require authentication and paging.

When working with tables, rows and columns, the main step in handling the hierarchical data from APIs is to relationalise it. This is done with a Matillion “API Profile” specification, which you will need to create.

Every API Profile contains one or more RSD “files”. Each file declares how to map an API response into rows and columns. In general you will need one RSD file for every different endpoint in the API.

This document describes how to author your first API Profile, and use it with a Matillion API Query component.

The AWS IP Ranges API

This is a useful first example, because it’s quite simple, is universally available, and requires no authentication or paging.

The API returns a list of the IP address ranges which AWS use for services, and is returned as a JSON document with a nested array. You can access it from a web browser by just navigating to the URL: https://ip-ranges.amazonaws.com/ip-ranges.json

Creating a Matillion API Profile

Start by navigating to Project / Manage API Profiles. It should open the main dialog window, containing one API Profile for every API that has been configured. Matillion does ship with some examples, including one for accessing its own REST API.

Click the + sign to add a new one, and name it "AWS".

Find "AWS" in the list, and press the green cog icon to edit it.

This should open a “Configure API Profile” screen. Press the green + sign again to add a new "file", and name it "ip-ranges".

Don't give it any suffix at this stage.

Now the “Configure API Profile” screen should reappear, with the file ip-ranges.rsd open in editable mode. Paste the contents of the attached file ip-ranges.rsd.txt into the (initially-empty) script window on the right.

After pressing OK, you will be able to edit the AWS API Profile again (use the same cog icon again). Press Test, and left-click on the AWS IP Ranges "table" at bottom left. Matillion should invoke the API and you will hopefully see some rows and columns coming back to the screen!

RSD file anatomy

  • Each RSD “file” in an API Profile describes how to handle one of the endpoints in the API. The main features are:
  • Naming the URI (line 8)
  • Deciding how “rows” are defined (line 10)
  • Accessing the columns from every row (lines 4 and 5)
  • Informing Matillion what kind of API it is dealing with (line 13). In this case it’s a jsonproviderGet, but there is also an equivalent xmlproviderGet
  • Describing the REST operation (line 12), which is an HTTP “GET” in this case

Once you have a working example, it’s usually best to create a second RSD file in the same profile which you can experiment with. Choose a different name for this, copy and paste the content from the working file, and use the Test button to verify that any changes you make are working as expected.

 

Using an API Profile in Matillion

  • Once you've successfully got some data back in the API Profile test window, you are ready to use it in a Matillion API Query component.
  • Create a new Orchestration job
  • Drag on an API Query component
  • Set it to Basic mode
  • The "Profile" is "AWS"
  • The "Data Source" is "ip-ranges" (which should appear in the dropdown list)
  • Choose an S3 Staging Area and a Target Table name

When you run this component, it should query the API and create a named table with the columns described in the RSD script. This completes the setup of a Matillion component which invokesan external API, translates the returned JSON into relational rows and columns, and loaded the data into a new table.

This table can now be used in Transformation jobs to combine with data from other sources. This enables you to create joined-up information from data sets that would otherwise have remained siloed.

You will also find attached an example Matillion job which uses the newly created API Profile. Import the file aws_ip_ranges_api.json into your Matillion instance using the Project / Import menu.

Paging

Enable API paging by adding the following to the <rsb:script> section of your rsd.

<rsb:set attr="EnablePaging" value="TRUE" />