Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

JSON/XML from API

Hi,
I am using Matillion for Snowflake. We would like to use "API Query" component to fetch data from an API and store the XML or JSON that the API outputs directly in a Snowflake table (column type VARIANT) instead of flat it out in several columns. Is it possible?

Regards,
Tiago Silva

15 Community Answers

Matillion Agent  

Arawan Gajajiva —

Hello Tiago -

Yes, you can indeed preserve the JSON/XML structure and have it insert directly into a table with a variant data type column. In the API Profile RSD setup, you can define the attribute with the following parameter:

other:valueFormat="aggregate"

So, an example of how you might define an attribute might look something like this:

    <attr name="my_json"             xs:type="string"   readonly="false" other:xPath="my_json"             other:valueFormat="aggregate" />

Hope this helps!

Arawan


Tiago Silva —

Hi,

There is something I am missing here, sorry.

This is the XML returned by the API:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<GetShipmentsByDateRangeResponse xmlns="http://edisoftwebservices.com/">
<GetShipmentsByDateRangeResult xmlns:a="http://schemas.datacontract.org/2004/07/EdiSoft.Common.Domain.ExportDomain" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Shipment>
<a:ActorCsid>83</a:ActorCsid>
<a:Addresses>
<a:Address>
<a:City>ISHØJ</a:City>
</a:Address>
<a:Address>
<a:City>Jönköping</a:City>
</a:Address>
</a:Addresses>
</a:Shipment>
</GetShipmentsByDateRangeResult>
</GetShipmentsByDateRangeResponse>
</s:Body>
</s:Envelope>

This is my API profile:
<rsb:script xmlns:rsb="http://www.rssbus.com/ns/rsbscript/2/">
<rsb:info title="Consignor API">
<attr name="ActorCsid" xs:type="string" other:xPath="a:ActorCsid" />
<attr name="Addresses_XML" xs:type="string" readonly="false" other:xPath="a:Addresses" other:valueFormat="aggregate" />
</rsb:info>
...
</rsb:script>

I expected to get the following output:
ActorCsid | Addresses_XML
83 | <a:Address><a:City>ISHØJ</a:City></a:Address><a:Address><a:City>Jönköping</a:City></a:Address>

But I got:
ActorCsid | Addresses_XML
83 |


Regards,
Tiago Silva


Matillion Agent  

Arawan Gajajiva —

Hi Tiago -

It’s not clear how you are setting your RepeatElement in your API Profile. However, if you use this as an example, it seems to return back the data you are looking for from your example XML.

<rsb:script xmlns:rsb="http://www.rssbus.com/ns/rsbscript/2/">
<rsb:info title="Consignor API">
<attr name="ActorCsid" xs:type="string" other:xPath="Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/Shipment/ActorCsid" /> 
<attr name="Addresses_XML" xs:type="string" readonly="false" other:xPath="Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/Shipment/Addresses" other:valueFormat="aggregate" />
</rsb:info>
  <rsb:set  attr="uri"                  value="http://someapi.com/l" />
  <rsb:set  attr="RepeatElement"        value="/" />

Hope this helps?

Best regards,
Arawan


Tiago Silva —

Hi,

It doesn't work even with your last example. I get column ActorCsid (it is not XML) correctly but column Addresses_XML is always blank...

My entire API profile:
<rsb:script xmlns:rsb="http://www.rssbus.com/ns/rsbscript/2/">

<!-- Request output definition -->
<rsb:info title="Consignor API">
<attr name="ActorCsid" xs:type="string" other:xPath="a:ActorCsid" />

<attr name="Addresses_XML" xs:type="string" readonly="false" other:xPath="a:Addresses" other:valueFormat="aggregate" />

<!-- Paging meta column -->
<input name="Rows@Next" desc="Identifier for the next page of results" />
</rsb:info>

<!-- URI -->
<rsb:set attr="uri" value="https://customer-api.consignorportal.com/PortalData/PortalData.svc" />

<!-- Iterator-->
<rsb:set attr="RepeatElement" value="/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/a:Shipment" />

<rsb:set attr="EnablePaging" value="TRUE" />

<!-- Headers -->
<rsb:set attr="ContentType" value="text/xml;charset=utf-8" />
<rsb:set attr="Header:Name#" value="SOAPAction" />
<rsb:set attr="Header:Value#" value="http://edisoftwebservices.com/IPortalData/GetShipmentsByDateRange" />

<!-- Request -->
<rsb:script method="GET" >
<rsb:set attr="method" value="POST" />

<!-- Assign value in _input.Rows@Next to userns.pageno -->
<rsb:check attr="Rows@Next">
<rsb:set item="userns" attr="pageno" value="[_input.Rows@Next | add(1)]" />
<rsb:else>
<rsb:set item="userns" attr="pageno" value="0"/>
</rsb:else>
</rsb:check>

<!-- Set the SOAP request as the POST payload -->
<rsb:set attr="data">
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:edis="http://edisoftwebservices.com/">
<soapenv:Header/>
<soapenv:Body>
<edis:GetShipmentsByDateRange>
<edis:userName>xxxx</edis:userName>
<edis:password>xxxx</edis:password>
<edis:installationValue>xxxx</edis:installationValue>
<edis:startDateTime>2018-11-01T15:00:00</edis:startDateTime>
<edis:endDateTime>2018-11-01T15:30:00</edis:endDateTime>
<edis:pageIndex>[userns.pageno]</edis:pageIndex>
</edis:GetShipmentsByDateRange>
</soapenv:Body>
</soapenv:Envelope>
</rsb:set>

<rsb:call op="xmlproviderGet">
<!-- Set Rows@Next to invoke paging -->
<rsb:set attr="Rows@Next" value="[userns.pageno]" />

<rsb:push/>
</rsb:call>
</rsb:script>

</rsb:script>

Thanks in advance.

/TS


Matillion Agent  

Arawan Gajajiva —

Hi Tiago -

I think your issue lies in your RepeatElement and xPath values.

I tested with your RSD Profile, modified to use the XML you initially provided. I duplicated the behavior you are seeing. I then changed the following portions of your RSD profile and got what looks like correct data back.

Line 7 (Addresses attribute)

<attr name="Addresses_XML" xs:type="string" readonly="false" other:xPath="Addresses" other:valueFormat="aggregate" />

Line 17 (RepeatElement)

<rsb:set attr="RepeatElement" value="/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/Shipment/" />

Can you give that a try and see if that works better for you?

Regards,
Arawan


Tiago Silva —

Hi Arawan,

Yes, it worked! Thanks a lot. Appreciated.

Just one more question out of curiosity.
Let's say that instead of flattening out the top level of each Shipment and only keep the XML of the sub-levels (Addresses in the example I sent you) as I currently do, I would like to store each Shipment's entire XML: <Shipment><ActorCsid>1</ActorCsid><Addresses><Address><City>AAA</City></Address></Addresses></Shipment>.
Is it possible to store each Shipment's XML in a row using an API profile or would I have to go a level up in the XML and get /Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult (which contains multiple Shipments) and then use Snowflake functions such as FLATTEN to convert one row (from API Query component) into several rows (one per address)?

Regards,
Tiago Silva


Matillion Agent  

Arawan Gajajiva —

Hi Tiago -

Yes, you should be able to do what you have described using the same syntax as discussed in this thread. As you suggest, you would need to define your RepeatElement and Attribute definitions appropriately.

For example:

Shipment Attribute:

  <attr name="shipment" xs:type="string" other:xPath="Shipment" other:valueFormat="aggregate" /> 

RepeatElement:

  <rsb:set attr="RepeatElement" value="/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult" />

Best regards,
Arawan


Tiago Silva —

Hi,
That would return only the 1st Shipment, isn't it? There is only one GetShipmentsByDateRangeResult (/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult) and although there are many <Shipment> in it I only get the 1st one.

/TS


Matillion Agent  

Arawan Gajajiva —

Hi Tiago -

I was suggesting based on the example data you previously shared. Essentially, you want to define a RepeatElement relative to the node in the response that you want one one row of data to represent. From there, you need to define the xpath of your attributes relative to the RepeatElement. Using the other:valueFormat=“aggregate” syntax allows you to preserve the XML/JSON of that attribute if it contains a nested structure.

Best regards,
Arawan


Tiago Silva —

Hi,

I am sorry but didn't quite get that one.
Let's take an example. This the XML:
<Envelope>
<Body>
<GetShipmentsByDateRangeResponse>
<GetShipmentsByDateRangeResult>
<Shipment>
<ActorCsid>1</ActorCsid>
<Addresses>
<Address>
<City>AAA</City>
</Address>
</Addresses>
</Shipment>
<Shipment>
<ActorCsid>2</ActorCsid>
</Shipment>
</GetShipmentsByDateRangeResult>
</GetShipmentsByDateRangeResponse>
</Body>
</Envelope>

And I would like to get 2 rows in Snowflake (1 column):
Row 1: <Shipment><ActorCsid>1</ActorCsid><Addresses><Address><City>AAA</City></Address></Addresses></Shipment>
Row 2: <Shipment><ActorCsid>2</ActorCsid></Shipment>

/TS


Matillion Agent  

Arawan Gajajiva —

Hi Tiago -

It sounds like you want to get one row of data per Shipment. So, you would define your RepeatElement like this:

  <rsb:set attr="RepeatElement" value="/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/Shipment" />

Then, relative to that RepeatElement, you can parse out the attributes like:

    <attr name="ActorCsid"                xs:type="integer" readonly="false" other:xPath="ActorCsid"                  />
    <attr name="Addresses"   xs:type="string"  readonly="false" other:xPath="Addresses"  other:valueFormat="aggregate"   />

This will return each ActorCsid in a row of data along with the Addresses XML for that Shipment.

Best Regards,
Arawan


Tiago Silva —

Hi,

That's exactly how I implemented it.
But I am curious to know if instead of having 1 row per Shipment with several columns (ActorCsid, Addresses, etc) it's possible to have 1 row per Shipment with only 1 column (the XML of the Shipment).

Let's take an example. This the XML:
<Envelope>
<Body>
<GetShipmentsByDateRangeResponse>
<GetShipmentsByDateRangeResult>
<Shipment>
<ActorCsid>1</ActorCsid>
<Addresses>
<Address>
<City>AAA</City>
</Address>
</Addresses>
</Shipment>
<Shipment>
<ActorCsid>2</ActorCsid>
</Shipment>
</GetShipmentsByDateRangeResult>
</GetShipmentsByDateRangeResponse>
</Body>
</Envelope>

And I would like to get 2 rows in Snowflake (in a table with only 1 column):
Row 1: <Shipment><ActorCsid>1</ActorCsid><Addresses><Address><City>AAA</City></Address></Addresses></Shipment>
Row 2: <Shipment><ActorCsid>2</ActorCsid></Shipment>

Would it be possible?

/TS


Matillion Agent  

Kalyan Arangam —

hi Tiago,

From what I understand, we cannot derive the aggregate at the Shipment level.
I am checking this with the engineering team.

Best
Kalyan


Matillion Agent  

Kalyan Arangam —

Hi Tiago,

I was wrong. Looks like its supported. Please make the following changes to your RSD.

RepeatElement
bc. <rsb:set attr="RepeatElement" value="/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/*" />

Column definitions
bc..<rsb:info title="ship" desc="Generated schema file." xmlns:other="http://www.rssbus.com/ns/rsbscript/2/other">
<attr name="Shipment" xs:type="string" other:valueFormat="aggregate" other:xpath="/Envelope/Body/GetShipmentsByDateRangeResponse/GetShipmentsByDateRangeResult/Shipment" />

</rsb:info>

Hope that helps.

Best
Kalyan


Tiago Silva —

Hi,
That's excellent. Thanks for your help!
I have implemented it with a LATERAL FLATTEN in Snowflake but next time this will save me time.

/TS

Post Your Community Answer

To add an answer please login