Trying to pull our bigquery google analytics data into redshift. It doesn't look like there is an option to load daily files since it deletes table and reloads every time. So trying to pull in all of our data (~30GB) and getting the error: "Parameter Validation Failure:SQL Query - Timeout.). I can get it to work when limiting to a smaller subset.
Any suggestions on how to get this working for a very large data pull that I would want to setup daily?
3 Community Answers
Joseph Meyer —
I was originally trying to get hits level data which was 50 million rows. I shortened it to session level info which was around ~6 million rows and it still errors out. Seems like the limit must be pretty low, wondering if this can be overcome as matillion won't be useful for us if it can't. Not sure if it is a row limit or a data limit as it is ~30GB of data for both session and hit level pulls
There’s no hard limit to the number of rows you can pull, however I’d recommend only pulling a daily load. You can one days worth of data into a temporary table and then append it onto another table in a Transformation job.
Even if you wanted to do a complete load, I’d recommend splitting it up into separate timeframes, perhaps using an iterator and merge all of the tables together in a Transformation job.
The big advantage of loading daily when your source is from BigQuery is with BigQuery you pay for the amount of data you return.