I saw the tutorial for pivoting tables that uses Transpose rows followed by Split.
The issue with this approach is that it is highly inefficient and does not scale well. LIST_AGG followed by SPLIT_PART is a computationally intensive operation which can become quite slow when concatenating / parsing long text values.
Instead, would it be possible to implement a pivot approach using GROUP BY with MAX(CASE...)
For instance for the following data:
key col val 1 A 1A 1 B 1B 2 A 2A 2 B 2B
You can use:
SELECT key, MAX(CASE WHEN col = 'A' THEN val END) AS val_a, MAX(CASE WHEN col = 'B' THEN val END) AS val_b GROUP BY 1
This scale super well and does the pivot + split steps in 1 go resulting:
key . val_a . val_b 1 . 1A . 1B 2 . 2A . 2B
2 Community Answers
Ian Funnell —
Thanks for this interesting suggestion, which we’ve been considering.
I agree that the SQL you’ve suggested is orders of magnitude faster than the LISTAGG/SPLIT_PART approach (and also similarly pushes down well on Spectrum).
But the two things actually achieve slightly different results, and so aren’t strictly comparable. LISTAGG does not deduplicate: it simply concatenates all the matching values into a long string, which then needs to be split (with careful ordering!) to create new columns. You would therefore need to do some pre-filtering and pre-aggregation anyway in order to use LISTAGG (in your example, filter WHERE col IN (‘A’, ‘B’) and GROUP BY col).
Currently we don’t have a transformation component which does exactly what you’re trying to do, so you’ll need to stick to using either the Calculator or the SQL components. But I agree it would be a nice addition to our functionality, and have added it as a feature request – internal reference EMD-4454.