Choice of Statistical Tests
Community Medicine · Biostatistics · lean revision notes
Choice of Statistical Tests
Picking the right statistical test is one of the most reliably tested Biostatistics areas in NEET PG. Examiners give you a clinical scenario (data type, sample size, distribution, number of groups, paired vs unpaired) and expect you to name the single correct test. This note builds the decision logic so you can answer any such MCQ in seconds.
Why this matters: the four questions you must always ask
Before choosing any test, run a quick mental checklist. Nearly every scenario question is solvable by answering four questions:
- What type of data is the outcome variable? (Qualitative/categorical vs quantitative; nominal/ordinal vs interval/ratio.)
- How many groups are being compared? (One sample vs a reference, two groups, or three or more groups.)
- Are the groups paired (related) or unpaired (independent)? (Same subjects measured twice = paired; different subjects = unpaired.)
- Is the data normally distributed and the sample large enough? (Decides parametric vs non-parametric.)
High-yield: Parametric tests assume the outcome (quantitative) data follow a normal distribution and have homogeneity of variance. When these assumptions fail, or the data are ordinal/skewed/small-sample, switch to the non-parametric equivalent.
Classification of data — the foundation
The single most common mistake is mis-classifying the variable. Tests are chosen by scale of measurement.
| Data type | Scale | Examples | Central tendency |
|---|---|---|---|
| Nominal (qualitative) | Categories, no order | Blood group, gender, alive/dead | Mode, proportion |
| Ordinal (qualitative) | Ordered categories | Pain score (mild/mod/severe), tumour stage | Median |
| Discrete (quantitative) | Whole-number counts | Number of children, attacks/year | Mean/median |
| Continuous interval | Equal intervals, no true zero | Temperature in °C, IQ | Mean |
| Continuous ratio | True zero | Weight, height, BP, haemoglobin | Mean |
High-yield: Nominal and ordinal = qualitative → usually non-parametric / chi-square family. Interval and ratio = quantitative → parametric tests if normally distributed.
Parametric vs non-parametric — the master comparison
| Feature | Parametric | Non-parametric |
|---|---|---|
| Distribution assumption | Normal (Gaussian) | Distribution-free |
| Data type | Quantitative (interval/ratio) | Ordinal, ranked, or skewed quantitative |
| Sample size | Larger / normality met | Small samples acceptable |
| Measure used | Mean, standard deviation | Median, ranks |
| Statistical power | Higher (if assumptions met) | Lower |
| Effect of outliers | Strongly affected | Resistant (robust) |
High-yield: Non-parametric tests use ranks/medians, are robust to outliers, and are preferred for small samples or ordinal data, but have lower power than the parametric test when the data are actually normal.
The decision flow for quantitative outcomes
Use this stepwise approach for a continuous outcome variable:
Step 1 → Is the data normally distributed? (Check via Shapiro–Wilk test, Kolmogorov–Smirnov test, histogram, or skewness.) No → go to non-parametric column.
Step 2 → How many groups? One / Two / ≥Three.
Step 3 → If two groups: paired or unpaired?
Step 4 → Pick the test from the table below.
| Scenario | Parametric (normal) | Non-parametric (skewed/ordinal) |
|---|---|---|
| One sample vs known mean | One-sample t-test | One-sample Wilcoxon signed-rank |
| Two independent groups | Unpaired (independent) Student t-test | Mann–Whitney U test (Wilcoxon rank-sum) |
| Two paired groups (before–after, matched) | Paired t-test | Wilcoxon signed-rank test |
| ≥3 independent groups | One-way ANOVA | Kruskal–Wallis test |
| ≥3 paired/repeated measures | Repeated-measures ANOVA | Friedman test |
| Association of two continuous variables | Pearson correlation | Spearman rank correlation |
High-yield: The classic NEET PG pairing to memorise — Mann–Whitney U is the non-parametric equivalent of the unpaired t-test, and Wilcoxon signed-rank is the non-parametric equivalent of the paired t-test. These two are confused constantly.
Mnemonic for the non-parametric set
"My Will Knows Friends" → Mann-Whitney (2 unpaired), Wilcoxon (2 paired), Kruskal-Wallis (≥3 unpaired), Friedman (≥3 paired).
The decision flow for qualitative (categorical) outcomes
When the outcome is a proportion/category (e.g., cured vs not cured, exposed vs not exposed), you compare frequencies.
Step 1 → Build a contingency table (e.g., 2×2).
Step 2 → Are the expected cell counts adequate? (Rule: expected frequency ≥5 in all cells; for 2×2 some texts use total n ≥ 40.)
Step 3 → Choose:
- Chi-square (χ²) test → comparing proportions/association between categorical variables when expected counts are adequate.
- Fisher's exact test → small sample / any expected cell count <5 (especially in a 2×2 table).
- McNemar's test → paired/matched categorical data (e.g., before–after on the same subjects, matched case–control with discordant pairs).
- Yates' correction (continuity correction) → applied to chi-square for a 2×2 table to reduce overestimation when numbers are modest.
| Categorical scenario | Test of choice |
|---|---|
| 2×2 or r×c table, n large, expected ≥5 | Chi-square test |
| 2×2 table, expected cell count <5 (small sample) | Fisher's exact test |
| Paired/matched proportions (before vs after, same person) | McNemar's test |
| 2×2 with modest numbers (continuity issue) | Chi-square with Yates' correction |
| Trend across ordered categories | Chi-square for trend (Cochran–Armitage) |
High-yield: Fisher's exact test is the answer whenever the question states a small sample or expected frequency <5 in a contingency table. McNemar's test is the answer for paired categorical / matched data — the buzzword is "same subjects before and after" or "matched pairs."
Correlation and regression
- Pearson correlation coefficient (r) → strength and direction of a linear relationship between two normally distributed continuous variables (parametric). Ranges −1 to +1.
- Spearman rank correlation (ρ) → non-parametric correlation for ordinal data or non-normal continuous data; based on ranks.
- Linear regression → predicts a continuous outcome from one or more predictors.
- Logistic regression → outcome is binary (e.g., disease yes/no); gives odds ratios, used widely in case-control and multivariable adjustment.
High-yield: A correlation coefficient near 0 means no linear relationship — but a strong non-linear (e.g., U-shaped) relationship can still exist with r ≈ 0. Correlation ≠ causation.
ANOVA — the comparison of three or more means
One-way ANOVA compares means across ≥3 independent groups using the F-test (ratio of between-group variance to within-group variance).
- A significant ANOVA tells you that at least one group differs, but not which one.
- Post-hoc tests (Tukey's HSD, Bonferroni, Scheffé) identify which specific pairs differ.
High-yield: Using multiple t-tests instead of ANOVA for ≥3 groups inflates the Type I error (α). ANOVA controls this. This is a favourite reasoning MCQ.
Z-test vs t-test
| Feature | Z-test | t-test |
|---|---|---|
| Sample size | Large (n > 30) | Small (n ≤ 30) |
| Population SD | Known | Unknown (use sample SD) |
| Distribution used | Normal (Z) | Student's t-distribution |
High-yield: Use the t-test for small samples with unknown population SD; the Z-test for large samples or when population SD is known. The t-distribution is flatter and broader, approaching normal as n increases.
Putting it together — worked scenario logic
- Compare mean haemoglobin between two independent groups of pregnant women, data normal → Unpaired Student t-test.
- Compare BP before and after a drug in the same 20 patients → Paired t-test (or Wilcoxon signed-rank if skewed).
- Compare median pain scores (ordinal) across 3 treatment arms → Kruskal–Wallis test.
- Association between smoking (yes/no) and lung cancer (yes/no), large sample → Chi-square test.
- Same 2×2 association but only 25 subjects with a cell of 2 → Fisher's exact test.
- Agreement of a new test vs old test on the same patients (paired categorical) → McNemar's test.
- Relationship between age and serum cholesterol (both continuous, normal) → Pearson correlation.
Tests of normality and other named tests
- Shapiro–Wilk test and Kolmogorov–Smirnov test → check whether data are normally distributed (decide parametric vs non-parametric).
- Levene's / Bartlett's test → check homogeneity of variance.
- Kappa (κ) statistic → measures inter-observer agreement for categorical data (Cohen's kappa). Values: <0.2 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.0 very good.
- Log-rank test → compares survival curves (Kaplan–Meier) between groups.
- Cox proportional hazards regression → multivariable survival analysis giving hazard ratios.
High-yield: For survival/time-to-event data, the comparison test is the log-rank test and the regression model is Cox regression. For agreement between two observers on categorical data, use the kappa statistic; for two continuous methods, use the Bland–Altman plot.
Complications and common pitfalls
- Using a parametric test on skewed/ordinal data → misleading results; the median is more appropriate.
- Multiple comparisons without correction → inflated false-positive rate; correct with Bonferroni.
- Confusing paired with unpaired → using an unpaired test on before–after data wastes power and is wrong.
- Ignoring expected cell counts → applying chi-square when Fisher's exact is required.
- Over-interpreting correlation as causation.
Key differentials (tests that get confused)
| Often confused | Correct distinction |
|---|---|
| Unpaired t vs Mann–Whitney U | Same purpose (2 independent groups); choose by normality |
| Paired t vs Wilcoxon signed-rank | Same purpose (2 paired groups); choose by normality |
| Chi-square vs Fisher's exact | Sample size / expected count <5 → Fisher's |
| Chi-square vs McNemar | Independent → chi-square; paired/matched → McNemar |
| ANOVA vs Kruskal–Wallis | ≥3 groups; normal → ANOVA, non-normal → Kruskal–Wallis |
| Pearson vs Spearman | Continuous-normal → Pearson; ordinal/non-normal → Spearman |
Recently asked / exam angle
NEET PG and INI-CET have repeatedly tested scenario-based selection rather than definitions:
- "Non-parametric test for comparing two independent groups" → Mann–Whitney U (very frequent).
- "Non-parametric equivalent of paired t-test" → Wilcoxon signed-rank.
- "Best test for a 2×2 table with small numbers / expected frequency <5" → Fisher's exact test.
- "Test for before-and-after categorical outcome in same patients" → McNemar's test.
- "Test to compare means of four groups of normally distributed data" → One-way ANOVA, with a follow-up on why not multiple t-tests (Type I error).
- "Correlation for ordinal data" → Spearman.
- "Test to compare survival curves" → log-rank.
- "Statistic for inter-observer agreement" → kappa.
The trick is always in the buzzwords: independent vs paired, small sample / expected <5, ordinal vs continuous, normal vs skewed, three or more groups, before-after.
Rapid revision
- Quantitative + normal + 2 independent groups → unpaired t-test; if skewed → Mann–Whitney U.
- Quantitative + normal + 2 paired groups → paired t-test; if skewed → Wilcoxon signed-rank.
- Quantitative + ≥3 independent groups → ANOVA; if non-normal → Kruskal–Wallis.
- ≥3 paired/repeated measures → repeated-measures ANOVA / Friedman test.
- Categorical + large sample → chi-square; expected cell <5 → Fisher's exact.
- Paired/matched categorical → McNemar's test.
- Two continuous normal variables → Pearson; ordinal/non-normal → Spearman.
- Non-parametric tests use ranks and medians, are robust to outliers, lower power.
- Multiple t-tests for ≥3 groups inflate Type I error — use ANOVA.
- Small sample, unknown population SD → t-test; large sample or known SD → Z-test.
- Survival curves → log-rank; multivariable survival → Cox regression (hazard ratio).
- Inter-observer agreement (categorical) → kappa statistic; binary outcome prediction → logistic regression (odds ratio).