Measure Dhs MEASURE DHS: Quality Information to plan, monitor and improve population, health, and nutrition programs
spacer
spacer spacer
 
spacer
spacer
Working with Datasets
spacer spacer

File Formats -- Recode Data

Recode Data | GPS Data | Merging Datafiles

Recode data files (including HIV and Other Biomarker Test Results) are available in five electronic formats:

  • Hierarchical CSPro File
  • Flat File (ASCII data with syntax file)
  • SPSS System File
  • SAS System File
  • STATA System File

The most common data file format used by researchers is the flat file. Files distributed in flat format include SPSS, SAS and STATA data definitions (syntax file).  The data file and its associated dictionary and documentation are distributed in archived ZIP files for all available formats. These zipped dataset files have meaningful file names. Learn more about dataset filenames

Hierarchical File Format

Hierarchical files contain a varying number of records for each case and are designed for use with packages supporting complex data structures. When DHS data are originally entered and saved using CSPro, this is the format that is used.  Each data file averages 4–35 megabytes in size.
The hierarchical structure defined by CSPro has several advantages and disadvantages. Among the advantages, the following can be highlighted:

  • All the data are stored in just one ASCII file.
  • Since all the data are stored in the same file, it is easy to maintain the integrity of the data in terms of data structure related to levels and records.

The major disadvantage is that this structure can be easily handled only by CSPro, or by a customized program written in low-level computer languages such as C, C++, FORTRAN, or Basic to name a few.

Flat File Format

Flat files contain a single record, sometimes with more than 2,000 characters for each case in the data file. These data files are approximately 10–60 megabytes in size.

In a flat file there is one record for each case. All variables in each case are placed one after the other on the same record. The multiple or repeating records of the file are placed one after the other on the record, with the maximum number of occurrences of each section being represented in every case. Each variable in a repeating section is placed immediately after the preceding variable of the same occurrence, such that all variables for occurrence 1 precede all variables for occurrence 2 of a section. The length of each record in the flat data file is fixed.

Multiple occurring variables and sections of data represent the main disadvantage to flat files. Each occurrence of every such variable must have its own name because statistical packages do not generally support the use of arrays or subscripts. For example, the third occurrence of the variable named V304 would be named V304$03 in SPSS, or V304_03 in SAS and STATA.

System File Formats (SPSS, SAS, STATA)

The System File is a binary file that can be read quickly by the relevant statistical package.  We provide system files for three popular statistical analysis software packages: SPSS (SAV), SAS (SD2), and STATA (DTA). Each system file contains all the data and descriptive information required to define and use the data, including variable names, variable labels, value labels, missing values, etc.  System files are the preferred format for these data files, particularly if the file is large, and if repeated analysis will be run using the same data file.

The main advantages of using the system file (rather than an ASCII data file in conjunction with a syntax file) include faster processing time, not having to modify the syntax file to refer to the correct path of the data file, and the ease of saving changes. The main disadvantage is that system files are platform dependent.  For example, the system files that we provide probably cannot be used on a Macintosh computer.  While these have not been tested, it is more likely that the ASCII data and syntax files can be accessed on different platforms, but will probably require some modification by the user.

Software packages:       SPSS Version 13 (SAV files)
                                    SAS Version 6 (SD2)
                                    STATA SE Version 7 (DTA)

spacer
spacer spacer
vertical line
spacer
spacer spacer spacer spacer