Cleansing actions allow you to automatically clean up some of your source data when building your SXV4 database.
Cleansing actions apply to columns that can only be one of a set of possible values. This includes:
- Classified columns - columns in fact tables that should be one of a set of possible values defined in a linked classification table.
- Foreign key columns - columns in fact tables that should be one of a set of possible values defined in another fact table or in a multi-response fact table.
For example, the Gender column might be linked to a classification table containing the codes M (Male), F (Female), and U (Unknown). What should SuperCHANNEL do if it encounters an empty value for this column in one of the records in the fact table?
By specifying a cleansing action, you control what SuperCHANNEL should do when it encounters a value that does not match one of the possible values.
You can choose from the following cleansing actions:
|Add To Classification|
Add the new value to the classification table. This is the default action for classified columns.
This option is only available for classified columns; it cannot be used for foreign key columns to other fact tables or between a fact table column and a multi-response fact table. If you try to set these to use the add to classification action then SuperCHANNEL will automatically revert to skip.
|Bin||Convert the value to a specified "bin" value. The bin value must already exist in the corresponding classification table.|
|None||Do not capture exceptions. Use this action for non-SXV4 drivers.|
Skip this record.
This can be a useful cleansing action for a column linking two fact tables. For example if you want SuperCHANNEL to skip all Account records that do not have associated Customer records.
Use caution when selecting this option: your target may end up with fewer fact records than the source.
Stop the build process.
This is the default action for foreign key columns to other fact tables.
Define the Cleansing Action
- Open the Target View and the Target Attributes pane.
- In the Target View, select the fact table column.
In the Target Attributes, select the Action from the Cleansing drop-down list.
If you select the bin action, you must also enter the bin Value.
For hierarchical classifications, while it is possible to choose the add to classification action, this will have no effect. If a fact table value for a hierarchical classification is not found in the classification table, it cannot be inserted because the rest of the links in the hierarchy are not known (the fact table value is always the bottom value in the hierarchy, and needs to be joined to another classification at a higher level).
If you select add to classification for a hierarchical classification, SuperCHANNEL will generate warning messages in the logs similar to the following:
ERROR 15:53:00:104 [main] XMLReader3_0 - Warning - TargetColumn.setCleansingAction: Column 'City': Cannot set cleansing action to ADD for a hierarchical classification reference: defaulting to SKIP.
In this case, the cleansing action is automatically changed to "skip" and the build continues. If you see this error, you should check the message at the end of the build logs for this fact table to verify that all of your fact table rows have been channelled (i.e., that there are no records actually being skipped). This log message will be similar to the following:
channelled 256 rows (out of 256) successfully to "F_Customers"
If all rows have been successfully channelled, then you can treat the above warning messages as informational rather than errors, and can safely ignore them.
If any rows are skipped, then you will need to find the missing classification values, add these to your hierarchy classification tables, and then run the build again.
If you do not want to see these messages in the logs, set the cleansing action for the hierarchical field to something other than add to classification.