I am getting Disk Full Errors on some larger running Orchestrations. I see plenty of suggestions when searching for Redshift Disk Errors but I'm not finding much relative to Matillion. Are there things we need to watch out for that can cause Disk Full Errors even where there is ample room on the server?
7 Community Answers
Ian Funnell —
Do you know if these errors are coming from Redshift or from AWS/EC2?
If the error is occurring when the orchestration runs a Transformation job, then it’s likely to be from Redshift.
And vice versa: if the error is occurring when the orchestration runs an orchestration component, then it’s more likely to be from AWS/EC2.
In your AWS console, please go to Redshift, select your cluster, go to the Cluster Performance tab and take a look at the Percentage disk space used graph while your job is running. If it gets to near 100% you might need to scale up the cluster size.
Thanks Ian. I'm still trying to gain access to our Console but from what I've been able to see on Redshift I don't think we are out of space. Do you know of other things that can cause Redshift Disk Full errors? This was a pretty large orchestration that ran for about 90 minutes before we saw the error. Just looking for other items to investigate.
Thanks Kalyan. I am still trying to gain the appropriate access to the console to make the changes you suggested. I believe that the disk full error is occurring during a very large update that uses a large select statements (several joins, etc.). I'd imagine this is the culprit. Assuming so, do you have any suggestions on what to do in this situation?
Assuming that’s the case, I would recommend breaking down your transform into multiple jobs.
Each job would perform a portion of the operation and write result to a temp table. The next job starts with the temp table and then performs the rest of the operations. Start with breaking into two transforms and more if necessary.
Also consider choosing only the columns you need and whether all columns are required to be carried across the entire operation. This would help reduce the amount of data redshift has to work with.
Can you check the Skew on your tables that you are using in the Joins. Try changing the DIST Key if they are skewed .
Its possible they are not distributed properly. If the query that’s failing has a join clause, there’s a good chance that’s what’s causing your errors. When Redshift executes a join, it has a few strategies for connecting rows from different tables together. By default, it performs a “hash join” by creating hashes of the join key in each table, and then it distributes them to each other node in the cluster. That means each node will have to store hashes for every row of the table. When joining large tables, this quickly fills up disk space.