Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Matillion pattern to flatten a parent-child hierarchy?

What's the best pattern y'all have come up with, in Matillion, for "flattening" a parent/child (AKA self-referencing) hierarchy? In a way that produces results like this:

I've started some ideas for a Matillion transformation job. But I bet it's been done already, so I'm wondering (a) if anyone could share the best/most efficient way they've come up with?

I'm also wondering (b) whether you can set up a series of jobs...probably using Matillion variables and a Loop Iterator component...that will dynamically determine the maximum hierarchy depth N, and automatically add exactly N-1 columns (one for each non-leaf level)? So that you don't have to specify the maximum hierarchy depth?

I am going to start working on this now, but please let me know if you can share a solution for this. Thanks in advance!

2 Community Answers

Matillion Agent  

Kalyan Arangam —

Hi Kevin,

We had a similar question a few months ago and here’s a link to it.

As advised, it is important to understand the maximum depth a hierarchy may descend to. Its easier if we know that in advance.

a) Given the lack of support in redshift, there is no easy way. It depends on the structure of data. Please can you share a sample with us?
If you source database supports flattening hierarchies (SQL Server/Oracle?) then I would recommend attempting it in your source and pulling the results into redshift.

b) depending on your data, we may be able to use a python component to fire appropriate queries using the ‘cursor’ object to determine the maximum depth and then drive a set of jobs to help flatten the hierarchy.

We can also explore using a python UDF which returns the maximum depth of the hierarchy.


Kevin Havice —

Several of our dimensions are "ragged," parent/child hierarchies. And they're not all from the same source system. So, I wanted to figure out how to do this in Matillion. It took a few hours, but I got it. I didn't use any Python components.

I did use a predetermined, hard-coded "maximum hierarchy depth." It does seem pretty hard to determine the max depth and then have Matillion do the following, with a dynamic number of levels.

The basic pattern is:

1. Set max depth to N.

2. Do one pass "up" the hierarchy, with a series of N-1 self-joins, adding N-1 "Helper" columns. For each record, in "Helper1" you store the name of its parent; in "Helper2" you store the name of its parent's parent; et cetera. When you run out of parents, then you leave the values of subsequent "Helper" columns NULL.

3. Do one pass "down" the hierarchy, adding N-1 "HierarchyLevel" columns. For each record, you calculate the value of its "HierarchyLevel1" column (its top level of the hierarchy) as the *last* non-NULL "Helper" column. For example, "Helper4" -- for a leaf record that is at the 5th level down. Then, you calculate the value of its "HierarchyLevel2" column as the second-to-last non-NULL "Helper" column -- "Helper3" from the example. And when you run out of "Helper" columns (for any records that are not at the maximum hierarchy depth N), you just keep repeating the name of the leaf.

Post Your Community Answer

To add an answer please login