DHS users should be aware that, in many cases, the data must be weighted. The following describes how DHS weights are constructed and when they should be used.
Sampling weights are adjustment factors applied to each case in tabulations to adjust for differences in probability of selection and interview between cases in a sample, either due to design or happenstance. In the DHS surveys, many times the sample is selected with unequal probability to expand the number of cases available (and hence reduce sample variability) for certain areas or subgroups for which statistics are needed. In this case, weights need to be applied when tabulations are made of statistics to produce the proper representation. When weights are calculated because of sample design, corrections for differential response rates are also made.
There are two main sampling weights in DHS surveys: household weights and individual weights. The household weight for a particular household is the inverse of its household selection probability multiplied by the inverse of the household response rate of its household response rate group. The individual weight of a respondent’s case is the household weight multiplied by the inverse of the individual response rate of her individual response rate group. There may be additional sampling weights for sample subsets, such as male surveys, anthropometry, biomarkers, etc. There is only a need for the additional sample weights if there is a differential probability in selecting the subsamples. For example, if one in five households is selected in the whole sample for doing biomarkers, then an additional sample weight is not necessary. However, if one in five households in urban areas and one in two households in rural areas are selected, then an additional sample weight is necessary when estimating national levels or for any group that includes cases from both urban and rural areas. Notwithstanding the foregoing, the DHS has customarily included both household weights and individual weights to the men’s surveys (modules), normalizing the weights for the number of households in the subset for the men’s surveys, and to the number of men’s individual interviews even when no differential subselection has been used.
Response rate groups are groups of cases for which response rates are calculated. In DHS surveys, households and individuals are grouped into sample domains and response rates are calculated for each domain.
A. Coverage: Excluded are dwellings without a household (no household lives in the dwelling, address is not a dwelling, or the dwelling is destroyed).
B. Numerator: Number of households with a completed household interview.
C. Denominator: Sum of number of households with a completed household interview, households that live in the dwelling but no competent respondent was at home, households with permanently postponed or refused interviews, and households for which the dwelling was not found.
A. Coverage: Women eligible for interview, usually women who are between the ages of 15 and 49 who slept in the household the night before the survey. In ever-married samples, women are eligible for interview only if they have ever been married or lived in a consensual union. In some surveys, the age range of eligibility has differed, e.g., all ever-married women age 12–49.
B. Numerator: Number of eligible women with a completed individual interview.
C. Denominator: Sum of number of eligible women with a completed individual interview, eligible women not interviewed because they were not at home, eligible women with permanently postponed or refused interviews, eligible women with partially completed interviews, eligible women for whom an interview could not be completed due to incapacitation and for other reasons.
Coverage: The age ranges and eligibility criteria has varied for men. Check with survey documentation.
Initial sample weights are produced by the DHS sampler using the sample selection probabilities of each household and the response rates for households and for individuals. The initial weights are then standardized by dividing each weight by the average of the initial weights (equal to the sum of the initial weight divided by the sum of the number of cases) so that the sum of the standardized weights equals the sum of the cases over the entire sample. The standardization is done separately for each weight.
Sample weights are calculated to six decimals but are presented in the standard recode files without the decimal point. They need to be divided by 1,000,000 before use to approximate the number of cases.
In tabulation programs, sampling weights need to be applied through the use of special commands.
Examples:
a) In SPSS using the WEIGHT command with the weight variable:
COMPUTE rweight = V005/1000000
WEIGHT by rweight.
b) In ISSA using the weight parameter
rweight = V005/1000000
x = xtab(table1, rweight).
1. The sum of the sample weights only equals the number of cases for the entire sample and not for subgroups such as urban and rural areas.
2. Where there are no differential probabilities, weights may not be calculated since weights based just on response rates usually make little difference in results.
3. Use of sample weights is appropriate when representative levels of statistics are desired, such as percentages, means, and medians.
4. Use of sample weights is inappropriate for estimating relationships, such as regression and correlation coefficients.
5. Use of sample weights biases estimates of confidence intervals in most statistical packages since the number of weighted cases is taken to produce the confidence interval instead of the true number of observations. For oversampled areas or groups, use of the sample weights will drastically overestimate sampling variances and confidence intervals for those groups.