Data Entry and Data Cleaning in SPSS: A Complete Guide

80% of Analysis Is Data Preparation

Experienced researchers know: clean data produces valid results. Rushing through data entry and preparation is the most common source of avoidable errors in statistical analysis. This guide walks you through every step in SPSS.

Defining Variables (Variable View)

Open SPSS and switch to the Variable View tab. Fill in the following columns for each variable:

Name: Short, no spaces (e.g., age, gender, q1)
Type: Numeric or String
Measure: Scale (continuous), Ordinal, or Nominal
Label: Full variable name (e.g., "Participant Age")
Values: Category codes for nominal variables (e.g., 1=Male, 2=Female)
Missing: Define a missing value code (e.g., 99 or -1)

Data Entry Tips

For large datasets, enter data in Excel and import into SPSS (File → Import Data). For paper-based surveys, consider double data entry to detect input errors: enter data twice independently, then compare for discrepancies.

Missing Data Management

Use Analyze → Missing Value Analysis to examine missing data patterns.

<5% missing → Listwise or pairwise deletion is acceptable.
5–20% missing → Consider mean imputation or multiple imputation.
>20% missing → Consider excluding the variable or proceed with caution.

Outlier Detection

Boxplot: Analyze → Descriptive Statistics → Explore → Boxplot. SPSS automatically flags outliers (o) and extremes (*).
Z-scores: |z|>3.29 indicates a univariate outlier.
Mahalanobis distance: For detecting multivariate outliers in regression.

Reverse Coding

Negatively worded items must be reverse-coded before reliability or factor analysis: Transform → Recode Into Different Variables. For a 5-point scale: recode 1→5, 2→4, 4→2, 5→1.

Computing Subscale Scores

Use Transform → Compute Variable to create subscale means: e.g., MEAN(q1, q2, q3, q4). This automatically handles missing values according to your specified options.