Paging with the API Query Component

This document builds on an earlier article "Writing API Profiles", and describes how to include paging in an API Profile that you have created. Other documents in this series: Using Parameters With API Profiles
Authentication with API Profiles

Overview


Many API's support "paging" or "pagination" to make it easier to download large volumes of data – as manageable chunks. Downloading the complete dataset involves issuing multiple requests with each request specifying the part of the data to download. Please read your API's documentation to understand how paging (if used) is implemented in that API.

The API Query Component supports certain constructs in RSD definition to allow issuing multiple requests to download relevant data in chunks. All downloaded data is then pushed into the specified target-table.
 

Implementing Paging in RSD

To support paging, introduce pseudo column rows@next in your list of columns and set the rows@next attribute to any information needed to make the request for the next page of data. When this value is set in the output, the driver will automatically call the RSD script again with the rows@next value in the input after it is finished returning results for this page. You can use the value of this input to modify the request on the next pass to get the next page of data.

The process can be broken down into 3-steps.

 

Step 1

Introduce pseudo column rows@next in your list of columns, within the <rsb:info tag. Typically this should be the last entry in your <rsb:info tag.

<input name="rows@next" desc="A system column used for paging. Do not change." />

 

Step 2

Set the attribute uri to an appropriate value. The value for rows@next value can be accessed via the _input collection (more on this in the example below):

<rsb:check attr="_input.rows@next">
    <rsb:set attr="uri" value="[_input.rows@next]" />
    <rsb:else>
        <rsb:set attr="uri" value="<first page's URL>" />
    </rsb:else>
</rsb:check>
 

Step 3

Set the value of rows@next to a value that can be used to identify if a subsequent request should be issued. For example, your API may return the next page's URL in the response. You can obtain this value by providing the XPath to the URL:

<rsb:set attr="elementmappath#" value="/next_page" />
<rsb:set attr="elementmapname#" value="rows@next" />

Please note, the above is just to illustrate that we can set a value for rows@next based on incoming data. Depending on the structure of the data and information available, we may have to take a different approach to updating rows@next. Please contact support if you need help with a specific API.

Step 2 will always hit the <rsb:else /> block on first request as rows@next will be NULL.

 

Step 4

Enable API paging by adding the following to the <rsb:script> section of your rsd.

<rsb:set attr="EnablePaging" value="TRUE" />
 

Example

Now lets discuss this in the context of an example based on REST API from desk.com, for which documentation can be found here. Below is an initial request and Response we will use to build our RSD to support paging.

Initial Request

https://api.desk.com/api/v2/cases?sort_field=updated_at&sort_direction=desc

Sample Response

Below is a snippet of the JSON returned…

{
  "page": 1,
  "total_entries": 1920,
  "_links": {
    "self": {
      "class": "page",
      "href": "/api/v2/cases?page=1&per_page=50&sort_direction=desc&sort_field=updated_at"
    },
    "first": {
      "class": "page",
      "href": "/api/v2/cases?page=1&per_page=50&sort_direction=desc&sort_field=updated_at"
    },
    "last": {
      "class": "page",
      "href": "/api/v2/cases?page=39&per_page=50&sort_direction=desc&sort_field=updated_at"
    },
    "next": {
      "class": "page",
      "href": "/api/v2/cases?page=2&per_page=50&sort_direction=desc&sort_field=updated_at"
    },
    "previous": null
  },
  "_embedded": {
    "entries": [
      {
        "id": 2162,
        "blurb": "help implement paging with API Query...",
        "changed_at": "2017-05-25T10:50:29Z",
        "created_at": "2017-05-23T12:08:58Z",
        "custom_fields": {

The header in the JSON response contains links to “first”, “last”, “next” and “previous” pages followed by a list of 50 cases (default). Using this information we can navigate between pages of information and retrieve data. Note that “previous” is null for the first page. Similarly, “next” will be null for the last page. The “last” link tells us we have 39 pages of information to download.

 

Building our RSD

The goal is to build an RSD that will issue the following requests.

Initial request

https://api.desk.com/api/v2/cases?sort_field=updated_at&sort_direction=desc
 

Page 2

https://api.desk.com/api/v2/cases?page=2&per_page=50&sort_direction=desc&sort_field=updated_at
 

Page 3

https://api.desk.com/api/v2/cases?page=3&per_page=50&sort_direction=desc&sort_field=updated_at

…Page 39
https://api.desk.com/api/v2/cases?page=39&per_page=50&sort_direction=desc&sort_field=updated_at


Below is the RSD required to work with this API. The lines specific to paging are highlighted in yellow. Please note, the URL api.desk.com needs to be updated with your DESK end-point if you have an account with Desk.com and would like to use the following RSD.

<rsb:script xmlns:rsb="http://www.rssbus.com/ns/rsbscript/2/">
  <rsb:info title="Desc Cases" desc="List of cases from Desk.com." >
    <!-- Column definitions -->
    <attr name="id" xs:type="integer"  other:xPath="id" />
    <attr name="subject" xs:type="string"  other:xPath="subject" />
    <attr name="create_date" xs:type="datetime"  other:xPath="received_at" />
    <attr name="status" xs:type="string"   other:xPath="status" />

    <attr name="next_page" xs:type="string"   other:xPath="/_links/next/href" />

    <!-- Paging meta column -->
    <input name="Rows@Next" desc="Identifier for the next page of results" />
  </rsb:info>

  <rsb:set attr="RepeatElement" value="/_embedded/entries" />

  <rsb:script method="GET" >
   <rsb:set attr="EnablePaging" value="TRUE" />
    <rsb:check attr="Rows@Next">
      <rsb:set attr="uri" value="https://api.desk.com[_input.Rows@Next]"/>
      <rsb:else>
        <rsb:set attr="uri" value="https://api.desk.com/api/v2/cases/search?status=pending&amp;per_page=50&amp;sort_direction=desc&amp;sort_field=updated_at"/>
      </rsb:else>
    </rsb:check>

    <rsb:call  op="jsonproviderGet">
      <rsb:check attr="next_page" >
        <rsb:set  attr="Rows@Next" value="[next_page]" />
      </rsb:check>
      <rsb:unset attr="next_page"/>
      <rsb:push/>
    </rsb:call>
  </rsb:script>

</rsb:script>
 

Step 1

Introduce metadata column rows@next. We also introduce a column next_page to capture the appropriate value from the resulting JSON. This is done as pseudo-columns like rows@next do not support other:xpath attribute and hence cannot map to a value directly. We then assign this value to rows@next in Step 3.
<attr name="next_page" xs:type="string" other:xPath="/_links/next/href" />
<!-- Paging meta column →
<input name="Rows@Next" desc="Identifier for the next page of results" />
 

Step 2

On every pass, the API driver evaluates the value in the uri attribute and issues a request.

<rsb:check attr="rows@next">
  <rsb:set attr="uri" value="https://api.desk.com[_input.rows@next]"/>

  <rsb:else>
    <rsb:set attr="uri"  value="https://api.desk.com/api/v2/cases?sort_field=updated_at&sort_direction=desc"/>
  </rsb:else>
</rsb:check>

Note the use of [_input.rows@next]

The initial request always evaluates to <rsb:else> as rows@next is NULL. We then set appropriate value in Step 3 so subsequent requests do not hit else.

 

Step 3

Below is the RSD snippet where we set the value for rows@next

<rsb:call  op="jsonproviderGet">
  <rsb:check attr="next_page" >
    <rsb:set attr="rows@next" value="[next_page]" />
  </rsb:check>
  <rsb:unset  attr="next_page"/>
  <rsb:push/>    
</rsb:call>


A few points of note

  • The <rsb:call/> block (lines 1 – 7) is called after the data is downloaded and converted into an in-memory table by evaluating xpaths from <rsb:info/> block.
  • RSD defined in the <rsb:call /> block is called for every row returned. Desk.com API returns 50 rows with each request so lines 2-7 above will be evaluated for each row.
  • Line 3 checks if attribute/column next_page for that row has a value and assigns it to rows@next. The cell values for a row are available as attributes referenced by column name. The last page/request will return NULL into next_page hence Line 2 will evaluate to false and no value is assigned to rows@next.    
  • <rsb:unset /> on line 7 lets you delete a column from the resulting output. So there will be no next_page column in the final output – although we have a column definition for it. Of course you may remove this line if you need next_page.
  • <rsb:push/> on line 8 pushes an item into the output feed of the script. If there are no <rsb:push/> elements in your script, all rows are discarded and no data is loaded.


Conclusion

Hopefully this article would have given you a general idea of how to implement paging with the API Query component.

Paging is a common design-pattern with modern REST API’s. The semantics generally differ and your use-case may require a completely different approach.

Please get in touch (support@matillion.com) if you need help building an RSD for your API.