Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Disk Full Errors

Hello,

I am getting Disk Full Errors on some larger running Orchestrations. I see plenty of suggestions when searching for Redshift Disk Errors but I'm not finding much relative to Matillion. Are there things we need to watch out for that can cause Disk Full Errors even where there is ample room on the server?

Thanks


Tom

8 Community Answers

Matillion Agent  

Ian Funnell —

Hi Tom,

Do you know if these errors are coming from Redshift or from AWS/EC2?

  • If the error is occurring when the orchestration runs a Transformation job, then it’s likely to be from Redshift.
  • And vice versa: if the error is occurring when the orchestration runs an orchestration component, then it’s more likely to be from AWS/EC2.

In your AWS console, please go to Redshift, select your cluster, go to the Cluster Performance tab and take a look at the Percentage disk space used graph while your job is running. If it gets to near 100% you might need to scale up the cluster size.

Best regards,
Ian


Tom Flynn —

Thanks Ian. I'm still trying to gain access to our Console but from what I've been able to see on Redshift I don't think we are out of space. Do you know of other things that can cause Redshift Disk Full errors? This was a pretty large orchestration that ran for about 90 minutes before we saw the error. Just looking for other items to investigate.

Thanks

Tom


Matillion Agent  

Kalyan Arangam —

Hi tom,

I happened to run into this yesterday and besides deleting some tables, I ran VACUUM DELETE ONLY in redshift to recover space.

Please try the following to check disk usage in the past 24 hours

  • Visit AWS console and to the details of your redshift cluster
  • Switch to the *Cluster Performance
  • Change the Time RTange to 24 hours
  • Locate the chart for Write Throughput, Health Status and Percentage disk space used

Please share this with us if its okay.

Best
Kalyan


Tom Flynn —

Thanks Kalyan. I am still trying to gain the appropriate access to the console to make the changes you suggested. I believe that the disk full error is occurring during a very large update that uses a large select statements (several joins, etc.). I'd imagine this is the culprit. Assuming so, do you have any suggestions on what to do in this situation?


Thanks


Tom


Matillion Agent  

Kalyan Arangam —

hi tom,

Assuming that’s the case, I would recommend breaking down your transform into multiple jobs.

Each job would perform a portion of the operation and write result to a temp table. The next job starts with the temp table and then performs the rest of the operations. Start with breaking into two transforms and more if necessary.

Also consider choosing only the columns you need and whether all columns are required to be carried across the entire operation. This would help reduce the amount of data redshift has to work with.

Hope that helps.

Best
Kalyan


Tom Flynn —

I am still running into full disk errors. My question is, what can I do with the information I am getting with the error:

error: Disk Full
code: 1016
context: node:3
query: 381611
location: fdisk_api.cpp:486
process: query3_94 [pid=30297]


Is there somewhere else I can query with this information to get me detail? This job was processing about 30GB. Is that within the size that Matillion can process?

Sincerely,

Tom


Matillion Agent  

Harpreet Singh —

Hi Tom,

Can you check the Skew on your tables that you are using in the Joins. Try changing the DIST Key if they are skewed .
Its possible they are not distributed properly. If the query that’s failing has a join clause, there’s a good chance that’s what’s causing your errors. When Redshift executes a join, it has a few strategies for connecting rows from different tables together. By default, it performs a “hash join” by creating hashes of the join key in each table, and then it distributes them to each other node in the cluster. That means each node will have to store hashes for every row of the table. When joining large tables, this quickly fills up disk space.

Thanks
Harpreet


Matillion Agent  

Ian Funnell —

Hi Tom,

Just wanted to check back and see if any of the suggestions helped get the job to run successfully?

  • Splitting the work into multiple separate updates rather than trying to do it all at once (which seems to be requiring a lot of intermediate spool space inside Redshift)
  • Reducing the number of columns you are working with
  • Looking at the distribution keys of the tables involved, to check if a redistribution could make it more efficient
  • Checking the cardinality of the values in the joins
  • Monitoring disk usage in your Redshift console, and considering adding nodes if the usage gets near 100%

You’ll get the exact same error if you run that same SQL using another client (like SQL Workbench) because it’s a limitation within Redshift, and not something that Matillion has any influence on.

Best regards,
Ian

Post Your Community Answer

To add an answer please login