NOTE: a brief description for using syntax in SPSS/Windows follows.
In SPSS for Windows, choose the FILE menu, and then NEW and then SPSS
SYNTAX. A syntax window will open, allowing you to directly program SPSS
rather than using the menus. Type in the code, highlight the lines you
wish to run, and press the "run" button at the top of the
screen (a little black arrow pointing right). Pressing the control and R
keys simultaneously will also run the selected text. The output window
will open to show your output. Be sure you save your syntax window
separately (syntax windows will automatically be given the suffix
".sps"). With the syntax window as the active window, choose
the FILE menu, then SAVE SPSS SYNTAX.
1. Select the variables with which you would like to work. Note
whether each variable is categorical or continuous.
SAS: data libref1.newdata; set libref2.origdata (keep=variables);
OR (if you are keeping more variables than you are dropping):
data libref1.newdata; set libref2.origdata (drop=variables);
SPSS: GET FILE='path\filename' /KEEP = variablename1 variablename2
variablename3. EXECUTE.
Note: Use the "DROP" command instead of "KEEP" if
you are retaining most of the variables in the file. See page 338 in the
SPSS 6.1 Syntax Reference Guide for more information.
2. Run a frequency on each categorical variable and univariate
statistics on each continuous variable.
a) CATEGORICAL VARIABLES
SAS: proc freq; tables variables /missing; run;
SPSS: FREQUENCIES VARIABLES= variablename1 variablename2
variablename3.
b) CONTINUOUS VARIABLES
SAS: proc univariate plot; var variables; run; SPSS: DESCRIPTIVES
VARIABLES=variablename /FORMAT=LABELS NOINDEX /STATISTICS=MEAN SUM
STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS SKEWNESS /SORT=MEAN (A) .
3. Look in codebook to note values that refer to missing data.
If you wish to exclude these values (e.g., 99) from analysis, recode
them to system missing:
SAS: if variable =99 then variable =. ;
SPSS: RECODE variablename (99=SYSMIS) . EXECUTE .
4. Look at distributions
a) CATEGORICAL VARIABLES: Is the number of cases in each category
large enough to allow comparisons? If not, consider lumping categories
(be sure to create a new variable to prevent overwriting the old
information):
SAS: newvariable=oldvariable; if newvariable=value1 or newvariable=value2
then newvariable=value3;
SPSS: RECODE variablename (1 thru 3=1) (4 thru 5=2) INTO
newvariablename . EXECUTE .
b) CONTINUOUS VARIABLES: 1) Is the distribution normal? If not, and
normality is assumed for the statistical procedure you plan on using,
transform the variable (LOG, etc.) and recheck the transformed variable
for normality.
SAS: newvariable =log(oldvariable);
SPSS: COMPUTE newvariable=LN(oldvariablename). EXECUTE.
2) Are there any outliers?
5. Run checks for relationships between variables of interest.
a) CATEGORICAL BY CATEGORICAL VARIABLES (crosstabs)
SAS: proc freq; tables variable1 * variable2/ chisq missing; run;
SPSS: CROSSTABS /TABLES=variablename1 BY variablename2 /FORMAT=
AVALUE NOINDEX BOX LABELS TABLES /STATISTIC=CHISQ /CELLS= COUNT ROW
COLUMN .
b) CATEGORICAL BY CONTINUOUS VARIABLES (t-test)
SAS: proc ttest; class categorical variable; var contiuous variable ;
run;
SPSS: T-TEST GROUPS=categorical variable (level1 level2 level3)
/MISSING=ANALYSIS /VARIABLES=continuous variable /CRITERIA=CIN(.95) .
c) CONTINUOUS BY CONTINUOUS VARIABLES (scatterplot)
SAS: proc plot; plot variable1 * variable2; run;
SPSS: GRAPH /SCATTERPLOT(BIVAR)=variablename1 WITH variablename2
/MISSING=LISTWISE .
6. Unit of analysis
a) What is the unit of analysis needed to answer
your question (individual, family, dyad, county, state)?
b) Are all
variables of interest measured using this unit?
1) If so, skip to the
next step.
2) If not, you need to reformat all variables to match your
unit of analysis.
7. Merging files
a) What is the unique identifier to use in merging?
b) What to do with cases that are present in one file yet missing in the
other?
1) Keep all cases