Configure Discrete Perturbation

Prerequisites

To use perturbation, you must have:

R Keys in the unit records.
A perturbation table (PTable) file for each dataset. This file is in CSV format with the extension .pert

Module Properties

The Perturbation module has the following properties you can configure.

Property

Description

RKEY

Set this to true to use the R Keys in the unit records.

FREQ

Set this to true to perturb cell values based on the contribution count rather than the cross-tabulation (cell) value.

SmallN

An integer that controls how to lookup adjustments from the perturbation table when the count being perturbed is larger than the width of the PTable. Its value can be 10 or below (the default value is 10).

An important feature of perturbation is that all cells get perturbed, including large values. When determining the offset to apply to a given cell, the perturbation algorithm uses the count to select a column from the PTable. For example, if the count is 8, then a perturbation offset is picked from column 8 in the PTable (with the R Keys being used to identify which row in that column to use).

The SmallN setting allows perturbation to work for larger values, avoiding the need to create a PTable big enough to account for the largest possible count in the dataset. It does this by reusing the rightmost columns of the PTable for larger counts.

The methodology is as follows:

For counts up to and including PTableSize - SmallN, the column matching the count is used. For example:
- With a PTableSize of 30 and a SmallN of 10, counts of 1 to 20 use columns 1 to 20 respectively.
- With a PTableSize of 29 and a SmallN of 10, counts of 1 to 19 use columns 1 to 19 respectively.
For counts greater than PTableSize - SmallN, the algorithm divides the count by SmallN and gets the remainder, then adds this to PTableSize + 1 - SmallN.

For example:

PTableSize = 30, SmallN = 10

If Count Is...	Lookup From Column	Notes
25	26	Remainder of 25/10 is 5	30 + 1 - 10 + 5 = 25
30	21	Remainder of 30/10 is 0	30 + 1 - 10 + 0 = 21
31	22	Remainder of 31/10 is 1	30 + 1 - 10 + 1 = 22

PTableSize = 29, SmallN = 10

If Count Is...	Lookup From Column	Notes
25	25	Remainder of 25/10 is 5	29 + 1 - 10 + 5 = 25
30	20	Remainder of 30/10 is 0	29 + 1 - 10 + 0 = 20
31	21	Remainder of 31/10 is 1	29 + 1 - 10 + 1 = 21

RULESET

This setting is optional and not required for the default behaviour. Use this option to perturb other results:

Set this to the value "PERT()" (or omit the RULESET property altogether) for the default behaviour: discrete perturbation will apply to counts in the cross tabulation results.
Use the following optional parameters of PERT() to perturb another result set:
- The first parameter determines whether to perturb measures. Set this to true (perturb measures) or false (do not perturb measures). In most cases this should be set to false, as discrete perturbation is intended to perturb counts. If you have measures in your dataset, you should typically leave this set this to false and either:
  - Use the perturbed estimates module to proportionately scale the discrete perturbation to the measures.
  - Use continuous perturbation (in cases where the measures have a small number of contributors that dominate the measure results (such as an income variable with a small number of high salaries).
- The second parameter identifies the destination for the perturbation calculation. If not specified this will be the cross tabulation results.
- The third parameter identifies the source of the perturbation calculation. If not specified this will use the same value as the destination.
You can specify multiple RULESET options: separate each one with a | character.

For example:

`method "Rule" perturbation addproperty RULESET "PERT()"`
The default behaviour. Perturb counts in the cross tabulation results.

`method "Rule" perturbation addproperty RULESET "PERT(true)"`
Perturb measures in the cross tabulation results.

`method "Rule" perturbation addproperty RULESET "PERT(true,RECORD_COUNT)"`
Perturb the output from the record count plugin. See the example below for more details on this configuration.

`method "Rule" perturbation addproperty RULESET "PERT()\|PERT(true,RECORD_COUNT)"`
Perturb the counts in the cross tabulation results and the output from the record count plugin (see the example below).

PTableSize

The number of columns in the perturbation table. The default value is 30.

BigN

An integer to be used as the modulo base when adding R Keys. Must not exceed 2^32 (4294967296). The default value is 4294967296.

ConfidentialityModule

Set this to true to indicate to SuperSERVER that this module applies confidentiality rules. When set to true SuperSERVER will block access to the Record View feature.

Message

A message to be displayed to users in the client.

By default, these messages are displayed below the table. For SuperWEB2, you can change it so that the message appears above the table instead by setting table.tableMessageAboveTable to true in SuperWEB2’s configuration.properties file. This may be useful if you expect your users to create very large tables, as otherwise messages might be missed unless the user scrolls down to the end of the table.

PTable

The location of the perturbation file for this dataset.

If you do not set a value for the PTable property, then by default SuperSERVER expects this file to be saved in the same location as the SXV4 file, but with the extension .pert instead of .sxv4. For example, if the SXV4 file is C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking.sxv4 then the perturbation file is expected to be located at C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking.pert

If you want to use a different location for the file, then you can set the value of PTable to the location of the .pert file. You can either use an absolute path or a relative path (relative to the SuperSERVER program data directory, which is C:\ProgramData\STR\SuperSERVER SA if you installed to the default location).

Any backslashes in the path will need to be escaped with an additional backslash (forward slashes can also be used but do not need to be escaped). For example:

CODE

method "Rule" perturbation addproperty PTable "C:\\ProgramData\\STR\\SuperSERVER SA\\databases\\my-ptable.pert"

If the contents of the perturbation file are modified in any way, you must restart SuperSERVER in order for the change to take effect. This is for performance reasons (SuperSERVER caches the perturbation file so that it does not have to reload and parse it on every tabulation).

PropagateZeroes

Whether to propagate zeros between fact tables.

This setting is designed to counter a potential attack vector for perturbation in cases where it might be possible to use information from one fact table to infer that a value for a related fact table, which has been perturbed to zero, is not a true zero.

For example, consider a case where there is a household fact table and a people fact table describing the people within those households. An attacker might be able to create a table that counts households where a particular cell has one true record, representing a single household. The value of this cell would be perturbed to zero. However, if the attacker changes the table so that it counts people instead, and sees that there is now a value returned for that cell (because the number of people in that single household is large enough not to be perturbed to zero), then the attacker now knows that there is in fact at least one household record for that cell.

Propagate zeros is designed to circumvent this attack by propagating the perturbed zero between fact tables. In this example, it would propagate the perturbed zero from the household level down to the people level. The table counting people would therefore also return a zero value, regardless of how many people there actually are in that household.

This setting also coordinates perturbation with measures: if a fact table count is perturbed to zero, then the measures for that fact table will also be perturbed to zero.

To apply zero propagation, use the following command:

CODE

method <method_id> perturbation addproperty PropagateZeroes {"All"|"Ancestor"|"Same"|"None"}

The available settings are:

Ancestor	Propagation only happens one way: from ancestor to descendant fact tables. For example, from the (parent) household fact table to the (child) people fact table.
All	Propagation happens both ways to all fact tables.
Same	Propagation happens only from the count to the measures within the same fact table.
None	No propagation happens (default).

For example:

CODE

method "Rule" perturbation addproperty PropagateZeroes "Ancestor"

PropagateZeroesFieldValues

An optional setting that allows you to configure the propagate zeroes feature so that it only runs if specific fields or field values are in the table.

You can choose to specify:

An entire field, in which case propagate zeroes will apply if any value from that field is in the table; or
A field followed by a comma separated list of values from that field, in which case propagate zeroes will apply if any one of those specific values is in the table.

To apply conditional propagate zeroes, use one of the following commands:

CODE

method <method_id> perturbation addproperty progagateZeroesFieldValues "<fact_table_id>__<field_id>"

method <method_id> perturbation addproperty progagateZeroesFieldValues "<display_name>"

method <method_id> perturbation addproperty progagateZeroesFieldValues "<fact_table_id>__<field_id>,<field_value_id>,<field_value_id>..."

method <method_id> perturbation addproperty progagateZeroesFieldValues "<display_name>,<field_value_display_name>,<field_value_display_name>..."

You can specify multiple fields, in which case propagate zeroes will apply if any one of those fields or the specified field values are in the table. If you want to list a large number of fields, it may be easier to use the PropagateZeroesFieldValuesFile option instead, which allows you to use a text file to specify the list.

As shown above, you can either use display names or IDs when specifying the list. IDs are strongly recommended as it ensures that field values will be considered even if they are in group recodes.

You can obtain IDs from SuperADMIN using the cat command:

CODE

cat <dataset_id> <field_name>

For example:

CODE

> cat bank Gender
[ XTAB Field : 'Gender' ]
    [ ID : 'SXV4__Retail_Banking__F_Customer__Gender_FLD' ]
    [ Value Set : 'SXV4__Retail_Banking__C_Gender' ]

In this example the fact table ID is F_Customer and the field ID is Gender . To configure propagate zeros to only apply if the Gender field is in the table, you would use a command similar to the following:

CODE

method "Rule" perturbation addproperty PropagateZeroesFieldValues "F_Customer__Gender"

When specifying field values, you can also use either display names or IDs. You can obtain the IDs in SuperADMIN as follows:

CODE

cat <dataset_id> <field_name> values

For example:

CODE

> cat bank 'Marital Status' values
[ Value : 'Single' (id:S) ]
 [ Value : 'Married' (id:M) ]
 [ Value : 'Divorced' (id:D) ]
 [ Value : 'Unknown' (id:U) ]
 [ Value : 'Not Applicable' (id:-1) ]

To configure propagate zeroes to only apply if Married or Divorced are in the table, you would use a command similar to the following:

CODE

method "Rule" perturbation addproperty PropagateZeroesFieldValues "F_Customer__Marital_Status,M,D"

If you use the PropagateZeroesFieldValues setting, then you must not use PropagateZeroesFieldValuesFile. Only one of these settings can be used at a time.

PropagateZeroesFieldValuesFile

An optional setting that allows you to specify a CSV file containing a list of fields or field values where you want propagate zeroes to apply.

Fields and field values are specified in the same way as when using PropagateZeroesFieldValues .

CODE

method <method_id> perturbation addproperty progagateZeroesFieldValuesFile "<filename>"

Save your file to the SuperSERVER data directory (C:\ProgramData\STR\SuperSERVER SA).

For example:

CODE

method "Rule" perturbation addproperty PropagateZeroesFieldValuesFile "propagate-zeroes.csv"

propagate-zeroes.csv

CODE

method "Rule" perturbation addproperty PropagateZeroesFieldValues "F_Customer__Gender"
method "Rule" perturbation addproperty PropagateZeroesFieldValues "F_Customer__Marital_Status,M,D"

If you use the PropagateZeroesFieldValuesFile setting, then you must not use PropagateZeroesFieldValues. Only one of these settings can be used at a time.

PropagateZeroesThreshold

The propagation threshold. Use the threshold to control whether a cell can be set to zero by zero propagation from a related level/record count:

CODE

method <method_id> perturbation addproperty PropagateZeroesThreshold "<number>"

If the record count of a cell is less than or equal to this threshold, then it can be set to zero by zero propagation.

For example, the following command ensures that cells with record counts of 5 or less can be set to zero:

CODE

method "Rule" perturbation addproperty PropagateZeroesThreshold "5"

QUANTILEVALIDATION

QUANTILEPTABLE

QUANTILECONFIG

The location of the configuration files for quantile perturbation. By default, these files should be in the same location as the SXV4 file, but you can use these properties to set an alternative location. See Quantiles and Ranges for more details.

Apply the Plugin

Login to SuperADMIN and create a new method:
CODE
```
> method addmethod perturbation_method
```
This example sets the ID of the new method to perturbation_method. This ID will be used in all the following examples, although you can replace this with your preferred ID if you wish.
Add the Perturbation Data Control plugin to the method:
CODE
```
> method perturbation_method adddcplugin perturbation Perturbation
```
This example sets the ID of the plugin within this method to perturbation. You can replace this with your preferred ID.
The Perturbation at the end of this command is the library name for the perturbation module. This is case sensitive and must be specified exactly as shown here.

Set the plugin properties:

CODE

> method perturbation_method perturbation addproperty RKEY "true"
> method perturbation_method perturbation addproperty FREQ "true"
> method perturbation_method perturbation addproperty "SmallN" "10"
> method perturbation_method perturbation addproperty "PTableSize" "30"
> method perturbation_method perturbation addproperty "BigN" "4294967296"
> method perturbation_method perturbation addproperty ConfidentialityModule "true"
> method perturbation_method perturbation addproperty Message "Data has been perturbed"

Assign the method to a dataset (in this example we are assigning the method to a dataset with the ID bank):

CODE

> cat bank addmethod perturbation_method

You can review the method details using the command cat <dataset_id> methods details <method_id>:

CODE

> cat bank methods details perturbation_method
[ Method : perturbation_method (id:perturbation_method) (type:mandatory) ]
    [ Common ]
    [ DCPlugin : Perturbation (id:perturbation) (priority:1) ]
        [ RKEY : true ]
        [ FREQ : true ]
        [ SmallN : 10 ]
        [ PTableSize : 30 ]
        [ BigN : 4294967296 ]
        [ ConfidentialityModule : true ]
        [ Message : Data has been perturbed ]

Perturbation with Weighted Datasets

If you have weighted datasets, then you must apply an additional data control module, Average_cellwgt, to your perturbation methods. This module effectively scales up the perturbed amount to account for the weighting.

How does this module work?

The average cell weighting module calculates the unweighted cell value, applies perturbation to this, and then multiples the result by the average weight of the cell (calculated as the weighted value divided by the unweighted value):

CODE

(unweighted count + perturbation factor) * (weighted count / unweighted count)

This ensures that the effect of perturbation is scaled up appropriately to account for the weighting.

When using weighted datasets:

The average cell weight module must be added to the method after the perturbation module, as it uses the result of the perturbation as part of its calculation.
The FREQ property must be set to true.

The following is a complete example of perturbation with weighted datasets:

CODE

method addmethod weighted_perturbation_example

method weighted_perturbation_example adddcplugin weighted_perturbation Perturbation
method weighted_perturbation_example weighted_perturbation addproperty RKEY "true"
method weighted_perturbation_example weighted_perturbation addproperty "SmallN" "10"
method weighted_perturbation_example weighted_perturbation addproperty "PTableSize" "30"
method weighted_perturbation_example weighted_perturbation addproperty "BigN" "4294967296"
method weighted_perturbation_example weighted_perturbation addproperty ConfidentialityModule "true"
method weighted_perturbation_example weighted_perturbation addproperty Message "Data has been perturbed"

method weighted_perturbation_example adddcplugin Average_cellwgt Average_cellwgt

method weighted_perturbation_example common addproperty FREQ "true"

Make Perturbation of Measures Consistent

If you have measures in your dataset, you can use the perturbed estimates module to scale the perturbation proportionately to those measures. Simply add the perturbed estimates module to run after perturbation (and after Average_cellwgt in the case of weighted datasets).

For example:

CODE

method perturbation_method adddcplugin perturbation Perturbation

method perturbation_method perturbation addproperty RKEY "true"
method perturbation_method perturbation addproperty FREQ "true"
method perturbation_method perturbation addproperty "SmallN" "10"
method perturbation_method perturbation addproperty "PTableSize" "30"
method perturbation_method perturbation addproperty "BigN" "4294967296"
method perturbation_method perturbation addproperty ConfidentialityModule "true"
method perturbation_method perturbation addproperty Message "Data has been perturbed"

method perturbation_method adddcplugin perturbedestimates perturbedestimates

See Perturbed Estimates for more details.