Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Job Dependencies

I am trying to schedule dependency between jobs and can't seem to find a good way to do this. The scenario is as follows
1)Orchestration Job 1 is triggered using an AWS Lambda function which is in turn based on a trigger file arriving in a specific s3 bucket and loads a redshift table called orders
2) Orchestration Job 2 is triggered using an AWS Lambda function (trigger based as well) and loads a Redshift table called customers
3) Once the 1) and 2) complete, I want to run a transformation job which joins Orders and Customers to create a new Table Orders_Customers_Pv

I do not want to combine the 2 orchestration jobs since they are already fairly complex.

Is there a simple way to do this?

3 Community Answers

Matillion Agent  

Kalyan Arangam —

hi Tejas,

Please review the screen shot of a is a workflow which involves a Main job which call Orders, Customer and Orders_Customers_Pv jobs as needed..

https://s3-eu-west-1.amazonaws.com/mtln-public-data/dependant+job+workflow.png

The Key here is that Matillion will only run one instance of a job at a time, any further instances will be queued. Using a ‘Main Job’ will ensure one of Orders or Customers will be loaded first but never in parallel.

the Orders and Customers jobs respectively set a flag to indicate they have loaded.

The Python component retrieves the flags that were set

The If component decides if its time to load Orders_Customers_Pv

Hope that helps.

Best
Kalyan


Tejas Baxi —

Hi Kalyan,

Thanks for the reply. I am a little confused. Since, my trigger is file based (meaning the orders or customers loads get triggered when a trigger file is dumped by upstream process in a S3 bucket, the file trigger will fire Orders and Customers jobs (and then wouldn't it also queue this job and run it again as soon as the previous run is complete?) . I would not want double execution because of fear of data duplication/invalidation.

Can you provide me with some more information on whether that would happen?


Matillion Agent  

Kalyan Arangam —

Hi Tejas,

In this case, your triggers themselves will not trigger the Orders or Customers jobs. Both of them will trigger the controlling job “Main job” ( from my example).

Even if both your files land at the same time, and raise two triggers for “Main job”. Matillion will only run one at a time and will queue the other instance. So either Orders or Customers will get run initially followed by the other.

Its important that you pass the name of the file or a flag to the job indicating if its a Orders file or Customers file that raised this event. This is used by the first “If Component” to determine the load to execute – Orders or Customers.
The following example shows you how to pass the name of the file that triggered the lambda function.

https://redshiftsupport.matillion.com/customer/en/portal/articles/2243961-triggering-etl-from-an-s3-event-via-aws-lambda?b_id=8915

Hope that helps.

Best
Kalyan

Post Your Community Answer

To add an answer please login