Quantiles and Ranges - Perturbation
In addition to perturbing counts, it is important to ensure that your summation options (measures) do not allow individuals to be identified. This is particularly important if you have "sensitive" measures, where outlying values in the data might allow identification of specific individuals. One example might be salary information, where the CEO can potentially be identified because this individual's salary is significantly higher than that of any other employee.
Both SuperWEB2 and SuperCROSS allow users to create ranges and quantiles from your summation options:
In SuperWEB2, users can access the median for a summation option if you have enabled this using the In addition, they can create ranges and quantiles using the Range button. | ||
In SuperCROSS, users can create User Defined Fields for ranges and quantiles. They can also access the median and various percentiles from the Define Recode window. |
When you have sensitive measures, it is important to ensure that the quantile and range options do not allow individuals to be identified.
When you activate perturbation, quantile perturbation is also activated automatically, but it is not configured by default. This means that unless you add the configuration, all quantiles will automatically be disabled.
If you do not want to use quantile perturbation on your system (i.e., you want to allow users to create all quantiles), follow the steps in the "Disable Quantile Perturbation" section below.
Quantile Perturbation
Quantile perturbation adjusts the sizes of each quantile to avoid revealing sensitive information:
- SuperSERVER first calculates the fractional quantile or percentile position (i.e., a number between 0 and 1). For a median there are two quantile groups, so this number would be 0.5
It then perturbs this number before working out the quantile boundary of the percentile. So it effectively works out another percentile:
CODEp’ = p + perturbation_factor/number_of_contributors
The perturbation factor is used to determine how many values to move forwards or backwards from the original boundary of the percentile. For example, if the population is 1,000 and we want the median value (two quantile ranges), then without perturbation the median would be the 500th (or 501st) value. If the perturbation factor is -5, then the fraction would be adjusted down to 0.495, and SuperSERVER would return the 495th value as the median instead.
Configure Quantile Perturbation
When perturbation is configured, quantile perturbation is switched on automatically. To configure the settings, you need three quantile perturbation configuration files for each SXV4.
By default, these must be located in the same directory as the SXV4 file, although you can configure an alternative location for the files if necessary. See below for details.
<sxv4_filename>.sxv4.quantile_validation.csv
This file determines how quantile ranges and percentile summations can be used, and sets the minimum values for allowing the quantile with or without perturbation.
It is in CSV format, and contains the following columns:
Number of Ranges | The number of ranges in the quantile. Users will not be able to generate quantiles unless they are listed in this column. |
---|---|
Minimum number of cells | The minimum number of records required to allow this quantile to be generated. |
Perturbation Threshold | The threshold above which no perturbation will be applied. If the number of contributing records is above this value then there will be no quantile perturbation. To have no perturbation at all, set this value to be the same as the previous column. |
Description and Comments | (Optional). This column allows you to add comments to the file. The comments are not shown to users. |
The following example allows only quantiles with 2, 4, 5 and 10 ranges, and sets the minimum numbers of cells and the threshold value for each:
Number of Ranges, Minimum number of cells, Perturbation Threshold, Description and Comments
2,500,50000, "Two ranges. Also for median"
4,10,100000
5,100,100000
10,1000,100000
<sxv4_filename>.sxv4.quantile_perturbation.csv
This file contains the perturbation factors, which can either be positive, zero or negative values. It is in CSV format with 128 rows and 100 columns.
Each column is for a successive percentile, so the first column is for the 1st percentile, and the 50th column is for the 50th percentile (i.e. the median).
For example:
3,-4,-2,3,4, ...
-1,2,-2,0,-4,
-4,4,-1,-3,-3,
3,0,-2,1,2,2,
-4,4,2,3,4,3,
-2,2,0,-2,-4,-4,
...
<sxv4_filename>.sxv4.quantile_config.properties
This file provides other configuration settings for quantile perturbation. It currently contains one setting:
RSEPerturbationFactor | This setting accounts for the effect on RSE (Relative Standard Error) for surveys that are configured with weighting. It adjusts the jackknife variance used to calculated the RSE:
CODE
If you are not using weighted surveys (or you do not want to use quantile perturbation), set the |
---|
For example:
RSEPerturbationFactor = 0
Configure the Location of the Quantile Perturbation Files
By default, SuperSERVER expects the three quantile perturbation configuration files to be located in the same directory as the SXV4 file. If you wish, you can configure an alternative location for the files using the following module properties:
QUANTILEVALIDATION | quantile_validation.csv |
---|---|
QUANTILEPTABLE | quantile_perturbation.csv |
QUANTILECONFIG | quantile_config.properties |
To set the location, specify the full path and filename of the configuration file you want to use. Any backslashes in the path will need to be escaped with an additional backslash (forward slashes can also be used but do not need to be escaped).
For example:
method perturbation_method perturbation addproperty QUANTILEVALIDATION "C:\\my\\path\\my.sxv4.quantile_validation.csv"
method perturbation_method perturbation addproperty QUANTILEPTABLE "C:\\my\\path\\my.sxv4.quantile_perturbation.csv"
method perturbation_method perturbation addproperty QUANTILECONFIG "C:\\my\\path\\my.sxv4.quantile_config.properties"
Disable Quantile Perturbation
Quantile perturbation is enabled automatically when you activate perturbation. However, none of the above configuration files are created by default, so this means that quantiles will initially be automatically disabled for all of your SXV4s unless you create and add the three configuration files.
If you do not want to apply quantile perturbation for some or all of your SXV4s, then you need to add the three configuration files for each of your SXV4s, as follows:
To help you with this configuration, we have provided examples of the files as they need to be set up to disable quantile perturbation. Click the links below to download these examples, then simply make as many copies as you need, rename them so they include the name of your SXV4 in the filename, and copy to the same directory as the SXV4 file(s).
<sxv4_filename>.sxv4.quantile_validation.csv | To disable quantile perturbation, ensure this file has contents similar to the following. This example allows all quantiles from 2 to 10 ranges, and sets the minimum number of cells and the threshold to 1 in all cases, therefore no perturbation will be applied:
CODE
| Download Example |
---|---|---|
<sxv4_filename>.sxv4.quantile_perturbation.csv | To disable quantile perturbation, make sure this file contains 128 rows and 100 columns with all the values set to zero. For example:
CODE
| Download Example |
<sxv4_filename>.sxv4.quantile_config.properties | Set the value of the
CODE
| Download Example |
Ranges
It is also important to make sure that users cannot create ranges that are small enough to allow individuals to be identified. You can use the ranges
command in SuperADMIN to control the minimum and maximum acceptable values for ranges, as well as the minimum increment.