Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Recommended method to create surrogate keys

Hello,

Is there a recommended way to create surrogate keys that increment by 1 on a multi-node redshift cluster? I've tried the identity column but it does not seem to increment by 1 on each new record.

1 Community Answers

Matillion Agent  

Kalyan Arangam —

Hi Kevin,

You might find it easier to solve this kind of problem using a windowing function rather than the sequence generator.
From your input data, use a calculator component to add a new column, say “rn”, with function: ROW_NUMBER() OVER (ORDER BY some_column)

You can choose any column to order by. An update timestamp is usually a good choice. That will give every row an ID ascending from 1.

Now you need to take into account the IDs that are already in your target table, so:

1.Use a Table Input from your real, target table, choosing only the ID
2.Aggregate to find the max
3.Add a calculator with NVL, in case the target table is empty
4.Join to the input data using a dummy clause 1=1 (since there is only one row, you don’t actually need a real where clause)

Finally, you can add a calculator component to get the real, new surrogate ID by taking rn + max_id

In this way, new rows will get an ascending integer surrogate key, starting just above the max surrogate key that was already in the table.

Hope that makes sense,

Best regards,
Kalyan

Post Your Community Answer

To add an answer please login