We have a job with multiple orchestration and transformation components. This job also includes SQL Script Orchestration component.
We are seeing that when this SQL script component is running, it is putting a lock on other orchestration jobs also, meaning, during this time other orchestration components (mainly just copy components from S3) are just spinning and not really loading data. The SQL script component does not reference same sets of table that other orchestration jobs are trying to run. This prevents us from running multiple orchestration components in Parallel.
Is that an intended behavior ? How can we redesign our jobs so that maximum possible orchestration components run in parallel ?
3 Community Answers
Laura Malins —
Each Matillion job holds one connection into Redshift. This is done deliberately. Please see this article around parallelism:
Thank you Laura, from that article. I understand Matillion introduces restriction when running two transformation components in one orchestration job. However our use case / question is strictly related to use of S3 Load orchestration component which copies data from S3 into Redshift. I have seen other Orchestration (specifically S3 Load copy operation) running in parallel when SQL script component is not running. Only when SQL Script component runs it locks other orchestration components (specifically S3 Load copy operation) in the same job and hence my question.
The S3 Load Component is actually pushing down a COPY statement to Redshift.
As the article Laura linked above notes, the query is processed by all compute nodes in the cluster so there is no benefit of having another query run in parallel as both queries will be allocated 50% of the compute resources and and take approx 50% longer to process.