Perturbation - Data Control
Small values in cross tabulation results have the potential to reveal confidential information. Perturbation can be used to reduce that risk by adjusting cell values by small amounts.
A key (referred to as the cell key) is generated for each cell in the table based on the RKEYs for each unit record contributing to that cell value. The cell key and the cell count are then used as parameters to access the perturbation table, which is pre-defined and deployed for each database. The value obtained from looking up this perturbation table is then added to the cell value, to provide a perturbed result.
If you need to use perturbation in your deployment, please contact Space-Time Research support (support@spacetimeresearch.com) for advice on the appropriate approach for your processing needs.
Perturbation
Prerequisites
To use perturbation, you must have:
- Rkeys in the unit records.
- A perturbation table (PTable) file for each database, defined in a CSV file saved in the same location as the database file, but with the extension .pert.
With weighted databases, perturbation is not designed to work by itself. You may need another Data Control API module to calculate perturbed weighted values.
Module Properties
The Perturbation module has the following properties you can configure.
Property | Description |
---|---|
RKEY | Set this to true to use the Rkeys in the unit records. |
FREQ | Set this to true to perturb cell values based on the contribution count rather than the cross-tabulation (cell) value. |
SmallN | An integer to be used in deriving the lookup values for the perturbation table. Its value can be 10 or below. The default value is 10. |
PTableSize | The size of the perturbation table. The default value is 30. |
BigN | An integer to be used as the modulo base when adding RKeys. Must not exceed 2^32 (4294967296). The default value is 4294967296. |
ConfidentialityModule |
|
Message | A message to be displayed to users in the client. |
PTable |
This setting is available from version 8.0.4.16 onwards. It is not available in SuperSTAR 8.0 GA. The location of the perturbation file for this database. If you do not set a value for the PTable property, then by default SuperSERVER expects this file to be saved in the same location as the database file, but with the extension .pert instead of .sxv4. For example, if the database file is C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking.sxv4 then the perturbation file is expected to be located at C:\ProgramData\STR\SuperSERVER SA\databases\RetailBanking.pert If you want to use a different location for the file, then you can set the value of PTable to the location of the .pert file. You can either use an absolute path or a relative path (relative to the SuperSERVER program data directory, which is C:\ProgramData\STR\SuperSERVER SA if you installed to the default location). Any backslashes in the path will need to be escaped with an additional backslash (forward slashes can also be used but do not need to be escaped). For example:
CODE
|
Apply the Plugin to a Database
Login to SuperADMIN and create a new method:
CODE> method addmethod perturbation_method
Add the Perturbation Data Control plugin to the method:
CODE> method perturbation_method adddcplugin perturbation Perturbation
The perturbation module is named
Perturbation
. It is internal to SuperSERVER, and the naming is case sensitive.Set the plugin properties:
CODE> method perturbation_method perturbation addproperty RKEY "true" > method perturbation_method perturbation addproperty FREQ "true" > method perturbation_method perturbation addproperty "SmallN" "10" > method perturbation_method perturbation addproperty "PTableSize" "30" > method perturbation_method perturbation addproperty "BigN" "4294967296" > method perturbation_method perturbation addproperty ConfidentialityModule "true" > method perturbation_method perturbation addproperty Message "Data has been perturbed"
Assign the method to a database (in this example we are assigning the method to a database with the ID
bank
):CODE> cat bank addmethod perturbation_method
You can review the method details using the command
cat <database_id> methods details <method_id>
:CODE> cat bank methods details perturbation_method [ Method : perturbation_method (id:perturbation_method) (type:mandatory) ] [ Common ] [ DCPlugin : Perturbation (id:perturbation) (priority:1) ] [ RKEY : true ] [ FREQ : true ] [ SmallN : 10 ] [ PTableSize : 30 ] [ BigN : 4294967296 ] [ ConfidentialityModule : true ] [ Message : Data has been perturbed ]
Continuous Perturbation
The original module for perturbation is for the perturbation of discrete variables only. Continuous variable perturbation applies to continuous variables such as income. It can operate on both weighted and unweighted databases.
To use continuous perturbation, you must have:
- Rkeys in the unit records.
- An Ftable stored in a CSV file:
- This file must have exactly 2 columns.
- The first column represents the rank (with respect to TOPN) and must be an integer value, starting at 1 in the first row and incrementing for each additional row (1,2,3, etc).
- The second column represents the scaling factor and must be a number (can be a floating point value).
- A CTable stored in a CSV file:
- This file must have exactly 256 rows and at least 32 columns.
- All values must be numbers.
Apply the Plugin to a Database
The following examples show how to configure continuous perturbation for unweighted and weighted databases:
Unweighted Database
Configure the method:
CODE> method addmethod cont_perturbation_method > method cont_perturbation_method adddcplugin perturbation Perturbation > method cont_perturbation_method perturbation addproperty RKEY "true" > method cont_perturbation_method perturbation addproperty FREQ "true" > method cont_perturbation_method perturbation addproperty TOPN "2" > method cont_perturbation_method perturbation addproperty TOPN_RKEY "2" > method cont_perturbation_method perturbation addproperty SMALLC "5" > method cont_perturbation_method perturbation addproperty "FTABLE" "C:\perturbation\ftable.csv" > method cont_perturbation_method perturbation addproperty "CTABLE" "C:\perturbation\ctable.csv"
Notes:
TOPN_RKEY
is independent fromTOPN
but must be set to the same value. SuperSERVER processes a list of measure and rkey pairs, ranks the rkeys based on the descending order of the associated measure, and then picks up the top n rkey.- The
SMALLC
property is optional. It defaults to 5 if not specified. - If you do not set the
FTABLE
orCTABLE
properties, then they default to a file located in the same location as the SXV4 database, with the extension .ftable or .ctable.
Assign the method to a database (in this example we are assigning the method to a database with the ID
bank
):CODE> cat bank addmethod cont_perturbation_method
Weighted Database
Configure the method:
CODE> method addmethod cont_perturbation_method > method cont_perturbation_method adddcplugin perturbation Perturbation > method cont_perturbation_method perturbation addproperty RKEY "true" > method cont_perturbation_method perturbation addproperty FREQ "true" > method cont_perturbation_method perturbation addproperty TOPN "2" > method cont_perturbation_method perturbation addproperty TOPN_RKEY "2" > method cont_perturbation_method perturbation addproperty TOPN_MAIN_WEIGHT "2" > method cont_perturbation_method perturbation addproperty SMALLC "5" > method cont_perturbation_method perturbation addproperty "FTABLE" "C:\perturbation\ftable.csv" > method cont_perturbation_method perturbation addproperty "CTABLE" "C:\perturbation\ctable.csv"
Notes:
TOPN_RKEY
andTOPN_MAIN_WEIGHT
must be set to the same value asTOPN
.- The
SMALLC
property is optional. It defaults to 5 if not specified. - If you do not set the
FTABLE
orCTABLE
properties, then they default to a file located in the same location as the SXV4 database, with the extension .ftable or .ctable.
Assign the method to a database (in this example we are assigning the method to a database with the ID
bank
):CODE> cat bank addmethod cont_perturbation_method