Using Changed Data Capture

Changed Data Capture (CDC) is an SAP Data Services feature that tracks and reports the changes that have occurred in your data source since the last time you imported data from it and enables you to update the relevant targets in your data flows by applying those changes. To enable this feature, set the CDC Enable option to true in your adapter datastore.

When you import a table using a datastore that has CDC enabled, the Simba Salesforce SAP Data Services Adapter generates the following three columns in the imported table:

  • DI_OPERATION_TYPE: This column contains the type of change that was made to the data (INSERT, UPDATE, or DELETE).
  • DI_SEQUENCE_NUMBER: This column contains the sequence number. For example, a changed data record with sequence number 3 would be the third change that was made during the CDC time interval.
  • SFDC_TIMESTAMP: This column contains the timestamp from Salesforce.com indicating when the change occurred.
Important:

When CDC is enabled, the tables in the datastore cannot be used as target objects in data flows.

Setting a Start Date for Changed Data Capture

Normally, a CDC operation reads every change in the data source that has occurred within the Salesforce.com retention period. To avoid redundancy and optimize the process, you can enable check-points or use the date specified in the CDC Start Date source object parameter.

Check-points mark the date of your last CDC read so that the next CDC operation only reads changes that have occurred after that date. Each check-point is associated with a subscription name, which identifies the user that executed the CDC operation. Subscription names enable the data source to contain multiple unique check-points and to support cases where multiple users are executing CDC operations on the same data. To enable check points, on the CDC Options tab of a source object, select the Enable Check-Point check box and type a name in the CDC Subscription Name field.

Alternatively, you can specify a start date using the CDC Start Date parameter available on the Adapter Source tab of a source object. CDC reads start from this date if the date occurs within the Salesforce.com retention period and either of the following conditions are met:

  • The start date occurs after the check-point date.
  • Or, there is no check-point date available.

Otherwise, CDC reads start from the check-point date.

Designing a Data Flow with Changed Data Capture

The following procedure describes one method of using CDC. For more information about CDC, see the SAP Data Services Designer Guide on the SAP Help Portal: https://help.sap.com/viewer/p/SAP_DATA_SERVICES.

To design a data flow with Changed Data Capture:

  1. In the Advanced Options pane for your adapter datastore, set the CDC Enable option to true.
  2. Use the adapter datastore to import the table that you want to check for changes.
  3. Create a data flow that uses the CDC-enabled table as the source object, and then connect the source to a Query transform.
  4. Double-click the Query transform and then map the columns from the CDC table to the Query transform by selecting all the nodes in the Schema In pane and dragging them into the Schema Out pane.
  5. In the data flow, double-click the source object, then click the CDC Options tab, and then configure the following options:
    1. In the CDC Subscription Name field, type a unique name for identifying a checkpoint.
    2. To enable check-points so that CDC operations start from the date of the last read instead of reading every change in the Salesforce.com retention period, select the Enable Check-Point check box.
    3. Important:

      Do not select the Get Before-Image For Each Update Row check box. This feature is not supported in Salesforce.com data sources.

  6. Click the Adapter Source tab and then configure the following options:
    1. In the Check-Point Column field, type a name for the column that contains check-point timestamps.
    2. In the CDC Start Date field, type a start date for your CDC request using the format yyyy.mm.dd 24h:mm:ss. For example, 2015.12.31 18:30:00.
    3. Note:

      Salesforce.com does not take seconds into account for getUpdated() and getDeleted() requests because the Salesforce API truncates dateTime values. For more information, see "Polling for Changes" in the Salesforce.com documentation: https://www.salesforce.com/developer/docs/api/Content/polling_for_changes.htm.

    4. To include records that have been deleted from the data source in your CDC results, set the Disable CDC Deleted Records option to no.
  7. In the data flow, add a Map_CDC_Operation transform and then connect the Query transform to it.
  8. Double-click the Map_CDC_Operation transform, and then verify that the Sequencing column is set to DI_SEQUENCE_NUMBER and the Row Operation column is set to DI_OPERATION_TYPE.
  9. In the data flow, add the table to which you want to apply the changes tracked by CDC, then set the table as the target object, and then connect the Map_CDC_Operation transform to the target.

When you run the job that contains this data flow, the CDC-enabled source object retrieves the changes that have been made to the table in the database, and then these changes are applied to the target object.

Operation Mapping in CDC

A CDC operation returns a record for each individual change made in the Salesforce.com data. For example, if a row has been inserted in the Salesforce.com data source, then the CDC operation returns an INSERT record in Data Services.

When multiple changes occur to the same row in the data source and the final change is a DELETE operation, CDC represents these changes by returning only a single DELETE record in Data Services.

When a row is inserted and then updated, CDC returns an INSERT record and an UPDATE record, but both records shows the results of updating the row. The records do not show the state of the row between the INSERT and UPDATE operations.