Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

It appears that Matillion is processing message serially. How can I achieve parallelism?

Our job gets triggered by SQS message and we are using variables passed from SQS queue to ETL data to snowflake. It appears that the message is processed one at a time and it takes a couple of seconds per each message. We get lots of messages - is there a pattern we can follow to achieve parallelism?

11 Community Answers

Matillion Agent  

Dan D'Orazio —

Hi Rajish -

The nature of the queue dictates that the messages are pulled off in serial fashion. However, this should not prevent jobs from running in parallel, once Matillion picks up those messages from the queue.

Best -
Dan


Rajish Shakya —

Thanks Dan -

Is there an article or a blog post that you can reference me on how we can achieve that ?


Matillion Agent  

Dan D'Orazio —

Hi Rajish -

You are most welcome, we’re always happy to help. The Job Concurrency support page is most likely what you’re looking for. It explains how jobs run in parallel, which may help you take better advantage of concurrency.

Let us know if you need anything else.

Best -
Dan


Rajish Shakya —

Hey Dan - that article is a bit confusing. My concern is in my project we get messages from SQS and they are being picked up and put into a queue but only one task in the queue is being processed at a time.

Is there anything we can do so that multiple queue work at the same time?

Rajish




Matillion Agent  

Ian Funnell —

Hi Rajish,

I think the limit you’re hitting is not to do with SQS. Regardless of how a job is launched, multiple runs of the same job will queue behind each other.

So:

  • Sending 16 SQS requests to run the same job will cause them to run them sequentially, like you are seeing
  • Sending 16 SQS requests to run 16 different jobs will result in all 16 running concurrently (assuming nothing else is running)

Best regards,
Ian


Rajish Shakya —

Thanks Ian. That helps.

Is 16 the hard limit?

Rajish


Rajish Shakya —

Also, do these 16 jobs need to have their own variables (if we are passing variables from SQS)? I would assume sharing variables can overwrite each other?

Rajish


Matillion Agent  

Ian Funnell —

Hi Rajish,

Yes, 16 is a hard limit.

Also, yes I’d recommend you use Job level variables rather than (global) Environment Variables which can be used in multiple jobs simultaneously and which can consequently result in nasty logic bugs due to threading.

Please see this document for discussions on Job Concurrency.

Best regards,
Ian


Rajish Shakya —

Ian - this is really helpful.

One more clarification from the article - "The queuing of jobs with the same name applies per-node. For example, if two runs of the same job end up on the same node the second will queue behind the first, if they end up being run on different nodes they will run concurrently. This may cause deadlocks and should generally be avoided by adhering to the advice in this article."

How can jobs with the same name be picked up by two different nodes? Aren't jobs with the same name are basically the same job so that they should be queued up rather than run concurrently? Trying to understand so that I can design my jobs around this.

Rajish


Matillion Agent  

Ian Funnell —

Hi Rajish,

That only applies in the HA-Cluster scenario.. in which two separate Matillion instances (“nodes”) share a common metadata repository.

The lock which prevents multiple instances of the same job running simultaneously applies per node rather than to the cluster as a whole.

Also, in HA you have no control over which node will be selected to run a job. So if you submit two simultaneously:

  • they might both end up running on the same node, in which case one will queue behind the other and they will run serially
  • they might get allocated onto two different nodes, in which case they really will run in parallel

In other words, best to cater for this when you design the job.

All the above is only a consideration for HA Clusters.

Best regards,
Ian


Rajish Shakya —

Ian - thanks a lot for your help and prompt answers.

Post Your Community Answer

To add an answer please login