Whether you've analyzed DHS data before or are a first-time user, below are some resources to help you analyze DHS data efficiently.
On This Page
Step 1: Select surveys for analysis
Step 2: Review questionnaires
Step 3: Register for dataset access
Step 4: Download datasets
Step 5: Open your dataset
Step 6: Get to know your variables
Step 7: Use sample weights
Step 8: Consider special values
Step 1: Select surveys for analysis. Which surveys are you interested in using? See a list of surveys by country, type of survey, year, search by survey characteristics (for example, surveys that included HIV testing, or the Domestic Violence module), or use the full survey search.
Use the questionnaires to determine
- whether the information you want to analyze was collected in your survey of interest, and
- who you want to analyze (your unit of analysis).
Requests to access datasets are usually approved within 24 hours. You will receive an email from firstname.lastname@example.org once your request has been approved with instructions for download.
- The first two letters ("KE") refer to the country – in this case, Kenya. The country code list is here.
- The second two letters ("IR") refer to the data file type. IR is the individual (women's) recode file, MR is the men's recode, HR is the household recode, etc. The complete list of data file types is here. Based on your review of the questionnaires, select the file type you need for your unit of analysis.
- The next two characters ("41") refer to the phase and number of the survey. A complete explanation of this numbering is here. If you are only analyzing one survey, all datasets from that survey will have the same numbering.
- The last two letters refer to the software program you want to use. The DT file contains the Stata (.DTA) data file and associated documentation; The SV file contains the SPSS (.SAV) file; the SD file contains the SAS (.SD2) file; and the FL file contains an ASCII file and dictionaries.
A note for Stata users: if your memory and maximum number of variables (maxvar) have not been adjusted from the factory settings, you may get an error message when trying to open DHS datasets, which are very large:
Change the memory and maxvar settings. Try
set memory 450mto start. You may be able to set these values higher depending on your computer. These settings should allow you to open a DHS dataset.
set maxvar 10000
In your dataset (assuming you are using an IR, BR, KR, or MR file) check the label of v107 (mv107). The label says "highest year of education." If you analyze this variable assuming it is the respondent's highest year of education, you will have highly misleading results. Why? Because the variable label needs to be short, and so cannot give complete information about every variable included in the dataset. Download the DHS recode manual and look through it to find v107. See that v107 is the highest year of education at the level recorded in v106. Had you analyzed v107 as the highest years of education, you would have seriously underestimated the level of education in the country you are studying. This is just one example of why it is important to use the DHS recode manual.
|Sample weights in DHS datasets|
|Unit of analysis||Variable|
|Women or children||v005|
|HIV test results||hiv05|
like other variables in DHS datasets, decimal points are not included in the weight variable. Analysts need to divide the sampling weight they are using by 1,000,000. Examples:
generate wgt = v005/1000000In SPSS:
tab var [iweight=wgt]
COMPUTE WGT = V005/1000000.These are just examples; other types of weights are available in different software packages.
WEIGHT by WGT.
If you're having a problem using DHS data, and you've done all of the following:
- Made sure you're using the correct data file
- Made sure you're using the correct weights
- Checked the questionnaire to make sure the question was asked in the way you think it was in your survey
- Checked the DHS Recode Manual
- Checked the DHS Guide to Statistics