Using Government Data for Research
Michael R. Hudson, PhD
Mark J. Kittleson, PhD
[To download this in .pdf, click here]
The federal government has enormous amounts of health-related data available for analysis. Although the federal government takes the time to collect the data, little is done from the federal government level to evaluate such data. Thus, it is a natural for health behavior researchers to consider using this data for research. Over the past 10 years this data has now become easier to obtain by accessing it either through the Internet or by obtaining a CD-ROM. A listing for the various data-based available from the Centers for Disease Control and Prevention's National Center for Health Statistics and their charges can be found at http://www.cdc.gov/nchswww/products/catalogs/subject/cdprice.htm. In addition, many of these data bases are available to be downloaded directly from the CDC's server (presuming you have the time and hard-disk space to download the files).
One particular data set, the National Health Interview Survey (NHIS), is a nationwide survey of U.S. citizens. Trained personnel from the Census Bureau interview a probability sample of households each week. During a year, the sample is composed of 36,000 to 47,000 households, including 92,000 to 125,000 persons, depending upon the year.
Although the data is easily accessible, it is often difficult to determine how to download the data from the CD-ROM into a usable format for further analysis. Thus, this article will give the reader a step-by-step account on how to obtain the data from a National Health Interview Survey from 1993 to the present. These steps are specifically for the NHIS, however, with minor adaptations, will work with other data sets as well.
Instructions for Downloading NHIS Data Files From 1994(1)
INSTALLING AND RUNNING CD-ROMS
1. On WINDOWS 95 Desktop, click on Start at bottom of screen. Select Program and then click on MS-Prompt.
2. At C:\WINDOWS> type d:
3. At D:\> type Install
4. At enter destination drive for Installation: type c
(if CD-ROM is prior to 1992 type S.EXE /f /e /y20)
6. Follow instructions at bottom of screen.
VIEWING DATA
1. At top of screen, select Browse [enter]
2. Select File [enter]
3. Select desired file (i.e., Year 2000) [enter]
4. Select desired information.
-- Use arrow keys to select cells, highlighted in red.
-- Blue boxes at top and bottom of screen will explain data in the red highlighted cells.
-- ESC will return to main menu.
DOWNLOADING DATA TO HARD DRIVE AND FLOPPY DISK
1. At main menu, select Subset [enter]
2. Select Records [enter]
3. Skip "subset name" [enter]
Skip "description" [enter]
Skip "files" [enter]
4. At files box, select topic (i.e., Year 2000.DAT)
5. Press [enter] to highlight in red the selected topic and then press [F10] to accept.
6. Skip "Records (see below)" [enter]
7. Y should be highlighted red.
8. At "Field" press [enter]
9. Select variable(s) (i.e., RACE)
-- Repeat this process w/each category (Op, Value, Connector)
-- Example:
a. RACER1 [enter]
b. Op, press [9]
c. Value, highlight 1, 2, 3 by placing the cursor on the codes for race(s), press [enter] for each one, then press [F10].
d. Connector, select AND [enter]
10. Repeat steps a-d with additional variables (i.e., sex, age, etc.).
11. After all desired variables are selected, press [ESC] and then [F10].
12. You are now back at the main menu.
13. Select Fields. Select variable(s) previously selected (i.e., RACER1) [enter] [F10].
14. Select Export.
15. Select ASCII for viewing data in spreadsheets or other to view data in a database file.
Example:
-- Select ASCII
-- Export directory, hit [enter]
-- Export file destination: (give file a name, i.e., Yr2000info)
-- Generate statements. Select Y/N if you want codebook, SAS, BMDP, SPSS, EPI
16. Close CD-ROM screen and return to Windows desktop.
17. The data can be saved directly to a floppy disk (3.5) or to the Hard drive (C:). If saving to the hard drive continue following instructions, otherwise the procedure is complete.
18. Find NHIS or CD-ROM title being utilized. Double click on this file to open.
19. Find the name of the data you created (i.e., Yr2000info). Double click on the file name to open.
20. The computer may ask how to open file...choose spreadsheets if file was saved in ASCII on CD-ROM. Choose Database if file was saved in "other" on CD-ROM.
21. Data should then be displayed on screen.
22. Remember the order you chose the variables when compiling information. Variable names will not be identified at top of column in Spreadsheets or Database.
23. If all data desired is present, you can now rename the file and save to drive a:
24. Three files will be created with each set of data that is downloaded: (1) a file identifying the coding of the variables; (2) a file with statistic program commands (i.e., SAS); (3) a file containing all of the data (with no labels given).
25. Write program to analyze data. Insert appropriate statistical commands (i.e., SAS commands) into the file containing the data and then run program using statistical software.
Michael R. Hudson, PhD is in post-doctoral work at Austin Peay University; Mark J. Kittleson, PhD is professor of health education at Southern Illinois University. For specific questions on the strategies in downloading data, contact Dr. Hudson at mhudson@hop-uky.campus.mci.net . For information about the Electronic Notes column, contact Dr. Kittleson at kittle@siu.edu .
1. The following assumptions are made: C:\ is the hard drive; D:\ is the CDROM drive. [Enter] means you are to hit the enter button on the keyboard. Please note that NHIS CD-ROMS prior to 1992 will not run in WINDOWS 95.