Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

SQL Script orchestration Component

We have a job with multiple orchestration and transformation components. This job also includes SQL Script Orchestration component.

We are seeing that when this SQL script component is running, it is putting a lock on other orchestration jobs also, meaning, during this time other orchestration components (mainly just copy components from S3) are just spinning and not really loading data. The SQL script component does not reference same sets of table that other orchestration jobs are trying to run. This prevents us from running multiple orchestration components in Parallel.

Is that an intended behavior ? How can we redesign our jobs so that maximum possible orchestration components run in parallel ?

3 Community Answers

Matillion Agent  

Laura Malins —

Hi Devang

Each Matillion job holds one connection into Redshift. This is done deliberately. Please see this article around parallelism:

https://redshiftsupport.matillion.com/customer/portal/articles/2789349?b_id=8915

To maximise the components running in parallel they will need to run in separate jobs but as the article demonstrates this doesn’t mean Redshift will run them any quicker than it would in serial.

Thanks
Laura


Devang Patel —

Thank you Laura, from that article. I understand Matillion introduces restriction when running two transformation components in one orchestration job. However our use case / question is strictly related to use of S3 Load orchestration component which copies data from S3 into Redshift. I have seen other Orchestration (specifically S3 Load copy operation) running in parallel when SQL script component is not running. Only when SQL Script component runs it locks other orchestration components (specifically S3 Load copy operation) in the same job and hence my question.


Matillion Agent  

Paul Johnson —

Hi Devang,
The S3 Load Component is actually pushing down a COPY statement to Redshift.
As the article Laura linked above notes, the query is processed by all compute nodes in the cluster so there is no benefit of having another query run in parallel as both queries will be allocated 50% of the compute resources and and take approx 50% longer to process.

Regards,
Paul

Post Your Community Answer

To add an answer please login