We're using the Table Output component to write rows to TargetTable. The TargetTable has been defined with PRIMARY KEY ("rowid"). RowId gets its values from our source system. The Table Output component has Truncate option set to Append. What happens is that in the target table, we have multiple entries with the same RowId. So clearly RowId is not the primary key. Then what is?
Actually, we're happy with the results, but I want to understand what's going on. We want to keep the duplicates for the RowId, so the UPDATE should not overwrite existing entries with the same RowId. To be on the safe side, should we add more fields to make a truly unique primary key, or is the Append option enough to make sure we don't overwrite. And, in that case, what is the primary key?
1 Community Answers
Kalyan Arangam —
Unlike a traditional database, Redshift does not enforce uniqueness for the primary key constraint. You are allowed to maintain duplicates and redshift will not complain.
As for the Table Output component, it does not make use of (or need) a primary key. it simply does a Truncate+Load or Append operation on the target table. As you already know, APPEND can lead to duplicates.
If you require a primary key then you may insert a create-datetime field which holds the time when the new record was inserted. This should (in most cases) help uniquely identify a record if necessary.