replicate weight

Replicate weights are a series of variables that contain the information necessary for correctly computing the standard errors of point estimates when analysing survey data.

When conducting a survey, it is typically too impractical or expensive to collect data from either the entire population or a simple random sample. As a result, most statistical functions cannot be used directly to analyse the results. A form of weighting is needed to adjust the results so that statistical formulas can be used.

Where standard errors are concerned, there are two possible ways of making this correction. One way is called a Taylor series linearization method, and the other is called the replicate weight method. Before we can explain what the replicate weights are, we need to first understand some common elements found in many survey data sets (especially older data sets). These elements are used in the Taylor series linearization method.

Common Elements of Survey Datasets

Most people do not conduct their own surveys with sampling designs. Rather, they use survey data that some agency or company collected and made available to the public. The documentation must be read carefully to find out what kind of sampling design was used to collect the data. This is very important because many of the estimates and standard errors are calculated differently for the different sampling designs. Hence, if you misspecify the sampling design, the point estimates and standard errors will likely be wrong.

Below are some common features of many sampling designs.

Weights

There are many types of weights that can be associated with a survey.

Perhaps the most common is the sampling weight, sometimes called a pweight, which is used to denote the inverse of the probability of being included in the sample due to the sampling design (except for a certainty PSU, see below). The pweight is calculated as N/n, where N = the number of elements in the population and n = the number of elements in the sample. For example, if a population has 10 elements and 3 are sampled at random with replacement, then the pweight would be 10/3 = 3.33. In a two-stage design, the pweight is calculated as f1f2, which means that the inverse of the sampling fraction for the first stage is multiplied by the inverse of the sampling fraction for the second stage. Under many sampling plans, the sum of the pweights will equal the population total.

Primary Sampling Unit (PSU)

This is the first unit that is sampled in the design. For example, school districts from California may be sampled and then schools within districts may be sampled. The school district would be the PSU. If states from the US were sampled, and then school districts from within each state, and then schools from within each district, then states would be the PSU.

One does not need to use the same sampling method at all levels of sampling. For example, probability-proportional-to-size sampling may be used at level 1 (to select states), while cluster sampling is used at level 2 (to select school districts). In the case of a simple random sample, the PSUs and the elementary units are the same.

Strata

Stratification is a method of breaking up the population into different groups, often by demographic variables such as gender, race or SES. Once these groups have been defined, one samples from each group as if it were independent of all of the other groups.

For example, if a sample is to be stratified on gender, men and women would be sampled independent of one another. This means that the pweights for men will likely be different from the pweights for the women. In most cases, you need to have two or more PSUs in each stratum. The purpose of stratification is to improve the precision of the estimates, and stratification works most effectively when the variance of the dependent variable is smaller within the strata than in the sample as a whole.

Finite Population Correction (FPC)

This is used when the sampling fraction, the number of elements or respondents sampled relative to the population, becomes large.

The FPC is used in the calculation of the standard error of the estimate. If the value of the FPC is close to 1, it will have little impact and can be safely ignored.

The formula for calculating the FPC is ((N-n)/(N-1))1/2, where N is the number of elements in the population and n is the number of elements in the sample. To see the impact of the FPC for samples of various proportions, suppose that you had a population of 10,000 elements:

Sample Size (n)	FPC
1	1.0000
10	.9995
100	.9950
500	.9747
1000	.9487
5000	.7071
9000	.3162

Sampling with and without Replacement

Most samples collected in the real world are collected "without replacement". This means that once a respondent has been selected to be in the sample and has participated in the survey, that particular respondent cannot be selected again to be in the sample. Many of the calculations change depending on if a sample is collected with or without replacement.

Calculating Replicate Weights

There are several ways to create replicate weights. However, they are all based on a similar underlying logic:

The sample is broken up into subsamples, called replicates.
The estimate of interest is calculated from both the full sample and from each replicate.
The differences between the estimate from the full sample and each of the replicates is used to determine the variance, i.e., the standard error, around the estimate.

Different methods of creating the subsamples yield the different types of replicate weights., for example balanced repeated replication (BRR), jackknife (JK-1, JK-2 and JK-n) and successive differences.

The choice of what kind of replicate weight to create is determined by the type of sampling design that was used to collect the data (in particular, whether or not stratification was used and how many PSUs were in each strata):

If stratification was not used, then the appropriate replicate weight method would be jackknife delete-1.
If stratification was used and there were exactly two PSUs per strata, then either BRR (or BRR with Fay's correction) or jackknife delete-2 could be used.
If there were more than two PSUs per strata, jackknife delete-n would be used.

Besides protecting the privacy of survey respondents, the replicate weight method has other advantages. One is that the replicate weights can include information other than just about the strata and PSUs. Many surveys have corrections to the pweight to account for nonresponse, poststratification and/or raking to known totals, such as current Census figures. The effects of these adjustments can be incorporated into the replicate weights. Of course, there are some disadvantages to the replicate weight method. One is seen in extremely large data sets that have a huge number of replicate weights. Another disadvantage has to do with the calculation of non linear statistics, such as ratios and quantiles. If the number of strata is small, there is a possibility of bias.