To use perturbation, you must have:
- R Keys in the unit records.
- A perturbation table (PTable) file for each dataset. This file is in CSV format with the extension .pert
The Perturbation module has the following properties you can configure.
|Set this to |
|Set this to |
An integer that controls how to lookup values for the perturbation table when the cell value being perturbed is larger than the width of the PTable.
An important feature of perturbation is that all cell values get perturbed, even large values. The cell value determines which column in the PTable is used to look up the adjustment (i.e., if the cell value is 8, the perturbation adjustment would be selected from column 8). When the cell value is larger than the width of the PTable, the perturbation algorithm simply loops through the rightmost columns of the PTable (otherwise the PTable would need to have enough columns for any possible cell value).
Its value can be 10 or below. The default value is 10.
For example, if the
Use this option to perturb other results:
|The number of columns in the perturbation table. The default value is 30.|
|An integer to be used as the modulo base when adding R Keys. Must not exceed 2^32 (4294967296). The default value is 4294967296.|
Set this to
|A message to be displayed to users in the client.|
The location of the perturbation file for this dataset.
If you do not set a value for the PTable property, then by default SuperSERVER expects this file to be saved in the same location as the SXV4 file, but with the extension .pert instead of .sxv4. For example, if the SXV4 file is C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking.sxv4 then the perturbation file is expected to be located at C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking.pert
If you want to use a different location for the file, then you can set the value of PTable to the location of the .pert file. You can either use an absolute path or a relative path (relative to the SuperSERVER program data directory, which is C:\ProgramData\STR\SuperSERVER SA if you installed to the default location).
Any backslashes in the path will need to be escaped with an additional backslash (forward slashes can also be used but do not need to be escaped). For example:
Whether to propagate zeros between fact tables.
This setting is designed to counter a potential attack vector for perturbation in cases where it might be possible to use information from one fact table to infer that a value for a related fact table, which has been perturbed to zero, is not a true zero.
For example, consider a case where there is a household fact table and a people fact table describing the people within those households. An attacker might be able to create a table that counts households where a particular cell has one true record, representing a single household. The value of this cell would be perturbed to zero. However, if the attacker changes the table so that it counts people instead, and sees that there is now a value returned for that cell (because the number of people in that single household is large enough not to be perturbed to zero), then the attacker now knows that there is in fact at least one household record for that cell.
Propagate zeros is designed to circumvent this attack by propagating the perturbed zero between fact tables. In this example, it would propagate the perturbed zero from the household level down to the people level. The table counting people would therefore also return a zero value, regardless of how many people there actually are in that household.
This setting also coordinates perturbation with measures: if a fact table count is perturbed to zero, then the measures for that fact table will also be perturbed to zero.
To apply zero propagation, use the following command:
The available settings are:
An optional setting that allows you to configure the propagate zeroes feature so that it only runs if specific fields or field values are in the table.
You can choose to specify:
To apply conditional propagate zeroes, use the following commands:
You can specify multiple fields, in which case propagate zeroes will apply if any one of those fields or the specified field values are in the table. If you want to list a large number of fields, it may be easier to use the
As shown above, you can either use display names or IDs when specifying the list. IDs are strongly recommended as it ensures that field values will be considered even if they are in group recodes.
You can obtain IDs from SuperADMIN using the
In this example the fact table ID is
When specifying field values, you can also use either display names or IDs. You can obtain the IDs in SuperADMIN as follows:
To configure propagate zeroes to only apply if Married or Divorced are in the table, you would use a command similar to the following:
An optional setting that allows you to specify a CSV file containing a list of fields or field values where you want propagate zeroes to apply.
Fields and field values are specified in the same way as when using
Save your file to the SuperSERVER data directory (C:\ProgramData\STR\SuperSERVER SA).
The propagation threshold. Use the threshold to control whether a cell can be set to zero by zero propagation from a related level/record count:
If the record count of a cell is less than or equal to this threshold, then it can be set to zero by zero propagation.
For example, the following command ensures that cells with record counts of 5 or less can be set to zero:
|The location of the configuration files for quantile perturbation. By default, these files should be in the same location as the SXV4 file, but you can use these properties to set an alternative location. See Quantiles and Ranges for more details.|
Apply the Plugin
Login to SuperADMIN and create a new method:
This example sets the ID of the new method to
perturbation_method. This ID will be used in all the following examples, although you can replace this with your preferred ID if you wish.
Add the Perturbation Data Control plugin to the method:
This example sets the ID of the plugin within this method to
perturbation. You can replace this with your preferred ID.
Perturbationat the end of this command is the library name for the perturbation module. This is case sensitive and must be specified exactly as shown here.
Set the plugin properties:
Assign the method to a dataset (in this example we are assigning the method to a dataset with the ID
You can review the method details using the command
cat <dataset_id> methods details <method_id>:
Perturbation with Weighted Datasets
If you have weighted datasets, then you must apply an additional data control module,
Average_cellwgt, to your perturbation methods. This module effectively scales up the perturbed amount to account for the weighting.
The average cell weighting module calculates the unweighted cell value, applies perturbation to this, and then multiples the result by the average weight of the cell (calculated as the weighted value divided by the unweighted value):
This ensures that the effect of perturbation is scaled up appropriately to account for the weighting.
When using weighted datasets:
- The average cell weight module must be added to the method after the perturbation module, as it uses the result of the perturbation as part of its calculation.
FREQproperty must be set to
The following is a complete example of perturbation with weighted datasets: