It appears that Matillion is processing message serially. How can I achieve parallelism?
Our job gets triggered by SQS message and we are using variables passed from SQS queue to ETL data to snowflake. It appears that the message is processed one at a time and it takes a couple of seconds per each message. We get lots of messages - is there a pattern we can follow to achieve parallelism?
11 Community Answers
Dan D'Orazio —
Hi Rajish -
The nature of the queue dictates that the messages are pulled off in serial fashion. However, this should not prevent jobs from running in parallel, once Matillion picks up those messages from the queue.
You are most welcome, we’re always happy to help. The Job Concurrency support page is most likely what you’re looking for. It explains how jobs run in parallel, which may help you take better advantage of concurrency.
Hey Dan - that article is a bit confusing. My concern is in my project we get messages from SQS and they are being picked up and put into a queue but only one task in the queue is being processed at a time.
Is there anything we can do so that multiple queue work at the same time?
Also, yes I’d recommend you use Job level variables rather than (global) Environment Variables which can be used in multiple jobs simultaneously and which can consequently result in nasty logic bugs due to threading.
One more clarification from the article - "The queuing of jobs with the same name applies per-node. For example, if two runs of the same job end up on the same node the second will queue behind the first, if they end up being run on different nodes they will run concurrently. This may cause deadlocks and should generally be avoided by adhering to the advice in this article."
How can jobs with the same name be picked up by two different nodes? Aren't jobs with the same name are basically the same job so that they should be queued up rather than run concurrently? Trying to understand so that I can design my jobs around this.