AT

Choice of Statistical Tests

Community Medicine · Biostatistics · lean revision notes

Choice of Statistical Tests

Picking the right statistical test is one of the most reliably tested Biostatistics areas in NEET PG. Examiners give you a clinical scenario (data type, sample size, distribution, number of groups, paired vs unpaired) and expect you to name the single correct test. This note builds the decision logic so you can answer any such MCQ in seconds.

Why this matters: the four questions you must always ask

Before choosing any test, run a quick mental checklist. Nearly every scenario question is solvable by answering four questions:

  1. What type of data is the outcome variable? (Qualitative/categorical vs quantitative; nominal/ordinal vs interval/ratio.)
  2. How many groups are being compared? (One sample vs a reference, two groups, or three or more groups.)
  3. Are the groups paired (related) or unpaired (independent)? (Same subjects measured twice = paired; different subjects = unpaired.)
  4. Is the data normally distributed and the sample large enough? (Decides parametric vs non-parametric.)

High-yield: Parametric tests assume the outcome (quantitative) data follow a normal distribution and have homogeneity of variance. When these assumptions fail, or the data are ordinal/skewed/small-sample, switch to the non-parametric equivalent.

Classification of data — the foundation

The single most common mistake is mis-classifying the variable. Tests are chosen by scale of measurement.

Data type Scale Examples Central tendency
Nominal (qualitative) Categories, no order Blood group, gender, alive/dead Mode, proportion
Ordinal (qualitative) Ordered categories Pain score (mild/mod/severe), tumour stage Median
Discrete (quantitative) Whole-number counts Number of children, attacks/year Mean/median
Continuous interval Equal intervals, no true zero Temperature in °C, IQ Mean
Continuous ratio True zero Weight, height, BP, haemoglobin Mean

High-yield: Nominal and ordinal = qualitative → usually non-parametric / chi-square family. Interval and ratio = quantitative → parametric tests if normally distributed.

Parametric vs non-parametric — the master comparison

Feature Parametric Non-parametric
Distribution assumption Normal (Gaussian) Distribution-free
Data type Quantitative (interval/ratio) Ordinal, ranked, or skewed quantitative
Sample size Larger / normality met Small samples acceptable
Measure used Mean, standard deviation Median, ranks
Statistical power Higher (if assumptions met) Lower
Effect of outliers Strongly affected Resistant (robust)

High-yield: Non-parametric tests use ranks/medians, are robust to outliers, and are preferred for small samples or ordinal data, but have lower power than the parametric test when the data are actually normal.

The decision flow for quantitative outcomes

Use this stepwise approach for a continuous outcome variable:

Step 1 → Is the data normally distributed? (Check via Shapiro–Wilk test, Kolmogorov–Smirnov test, histogram, or skewness.) No → go to non-parametric column.

Step 2 → How many groups? One / Two / ≥Three.

Step 3 → If two groups: paired or unpaired?

Step 4 → Pick the test from the table below.

Scenario Parametric (normal) Non-parametric (skewed/ordinal)
One sample vs known mean One-sample t-test One-sample Wilcoxon signed-rank
Two independent groups Unpaired (independent) Student t-test Mann–Whitney U test (Wilcoxon rank-sum)
Two paired groups (before–after, matched) Paired t-test Wilcoxon signed-rank test
≥3 independent groups One-way ANOVA Kruskal–Wallis test
≥3 paired/repeated measures Repeated-measures ANOVA Friedman test
Association of two continuous variables Pearson correlation Spearman rank correlation

High-yield: The classic NEET PG pairing to memorise — Mann–Whitney U is the non-parametric equivalent of the unpaired t-test, and Wilcoxon signed-rank is the non-parametric equivalent of the paired t-test. These two are confused constantly.

Mnemonic for the non-parametric set

"My Will Knows Friends" → Mann-Whitney (2 unpaired), Wilcoxon (2 paired), Kruskal-Wallis (≥3 unpaired), Friedman (≥3 paired).

The decision flow for qualitative (categorical) outcomes

When the outcome is a proportion/category (e.g., cured vs not cured, exposed vs not exposed), you compare frequencies.

Step 1 → Build a contingency table (e.g., 2×2).

Step 2 → Are the expected cell counts adequate? (Rule: expected frequency ≥5 in all cells; for 2×2 some texts use total n ≥ 40.)

Step 3 → Choose:

  • Chi-square (χ²) test → comparing proportions/association between categorical variables when expected counts are adequate.
  • Fisher's exact test → small sample / any expected cell count <5 (especially in a 2×2 table).
  • McNemar's testpaired/matched categorical data (e.g., before–after on the same subjects, matched case–control with discordant pairs).
  • Yates' correction (continuity correction) → applied to chi-square for a 2×2 table to reduce overestimation when numbers are modest.
Categorical scenario Test of choice
2×2 or r×c table, n large, expected ≥5 Chi-square test
2×2 table, expected cell count <5 (small sample) Fisher's exact test
Paired/matched proportions (before vs after, same person) McNemar's test
2×2 with modest numbers (continuity issue) Chi-square with Yates' correction
Trend across ordered categories Chi-square for trend (Cochran–Armitage)

High-yield: Fisher's exact test is the answer whenever the question states a small sample or expected frequency <5 in a contingency table. McNemar's test is the answer for paired categorical / matched data — the buzzword is "same subjects before and after" or "matched pairs."

Correlation and regression

  • Pearson correlation coefficient (r) → strength and direction of a linear relationship between two normally distributed continuous variables (parametric). Ranges −1 to +1.
  • Spearman rank correlation (ρ) → non-parametric correlation for ordinal data or non-normal continuous data; based on ranks.
  • Linear regression → predicts a continuous outcome from one or more predictors.
  • Logistic regression → outcome is binary (e.g., disease yes/no); gives odds ratios, used widely in case-control and multivariable adjustment.

High-yield: A correlation coefficient near 0 means no linear relationship — but a strong non-linear (e.g., U-shaped) relationship can still exist with r ≈ 0. Correlation ≠ causation.

ANOVA — the comparison of three or more means

One-way ANOVA compares means across ≥3 independent groups using the F-test (ratio of between-group variance to within-group variance).

  • A significant ANOVA tells you that at least one group differs, but not which one.
  • Post-hoc tests (Tukey's HSD, Bonferroni, Scheffé) identify which specific pairs differ.

High-yield: Using multiple t-tests instead of ANOVA for ≥3 groups inflates the Type I error (α). ANOVA controls this. This is a favourite reasoning MCQ.

Z-test vs t-test

Feature Z-test t-test
Sample size Large (n > 30) Small (n ≤ 30)
Population SD Known Unknown (use sample SD)
Distribution used Normal (Z) Student's t-distribution

High-yield: Use the t-test for small samples with unknown population SD; the Z-test for large samples or when population SD is known. The t-distribution is flatter and broader, approaching normal as n increases.

Putting it together — worked scenario logic

  1. Compare mean haemoglobin between two independent groups of pregnant women, data normalUnpaired Student t-test.
  2. Compare BP before and after a drug in the same 20 patientsPaired t-test (or Wilcoxon signed-rank if skewed).
  3. Compare median pain scores (ordinal) across 3 treatment armsKruskal–Wallis test.
  4. Association between smoking (yes/no) and lung cancer (yes/no), large sampleChi-square test.
  5. Same 2×2 association but only 25 subjects with a cell of 2Fisher's exact test.
  6. Agreement of a new test vs old test on the same patients (paired categorical)McNemar's test.
  7. Relationship between age and serum cholesterol (both continuous, normal)Pearson correlation.

Tests of normality and other named tests

  • Shapiro–Wilk test and Kolmogorov–Smirnov test → check whether data are normally distributed (decide parametric vs non-parametric).
  • Levene's / Bartlett's test → check homogeneity of variance.
  • Kappa (κ) statistic → measures inter-observer agreement for categorical data (Cohen's kappa). Values: <0.2 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.0 very good.
  • Log-rank test → compares survival curves (Kaplan–Meier) between groups.
  • Cox proportional hazards regression → multivariable survival analysis giving hazard ratios.

High-yield: For survival/time-to-event data, the comparison test is the log-rank test and the regression model is Cox regression. For agreement between two observers on categorical data, use the kappa statistic; for two continuous methods, use the Bland–Altman plot.

Complications and common pitfalls

  • Using a parametric test on skewed/ordinal data → misleading results; the median is more appropriate.
  • Multiple comparisons without correction → inflated false-positive rate; correct with Bonferroni.
  • Confusing paired with unpaired → using an unpaired test on before–after data wastes power and is wrong.
  • Ignoring expected cell counts → applying chi-square when Fisher's exact is required.
  • Over-interpreting correlation as causation.

Key differentials (tests that get confused)

Often confused Correct distinction
Unpaired t vs Mann–Whitney U Same purpose (2 independent groups); choose by normality
Paired t vs Wilcoxon signed-rank Same purpose (2 paired groups); choose by normality
Chi-square vs Fisher's exact Sample size / expected count <5 → Fisher's
Chi-square vs McNemar Independent → chi-square; paired/matched → McNemar
ANOVA vs Kruskal–Wallis ≥3 groups; normal → ANOVA, non-normal → Kruskal–Wallis
Pearson vs Spearman Continuous-normal → Pearson; ordinal/non-normal → Spearman

Recently asked / exam angle

NEET PG and INI-CET have repeatedly tested scenario-based selection rather than definitions:

  • "Non-parametric test for comparing two independent groups" → Mann–Whitney U (very frequent).
  • "Non-parametric equivalent of paired t-test" → Wilcoxon signed-rank.
  • "Best test for a 2×2 table with small numbers / expected frequency <5" → Fisher's exact test.
  • "Test for before-and-after categorical outcome in same patients" → McNemar's test.
  • "Test to compare means of four groups of normally distributed data" → One-way ANOVA, with a follow-up on why not multiple t-tests (Type I error).
  • "Correlation for ordinal data" → Spearman.
  • "Test to compare survival curves" → log-rank.
  • "Statistic for inter-observer agreement" → kappa.

The trick is always in the buzzwords: independent vs paired, small sample / expected <5, ordinal vs continuous, normal vs skewed, three or more groups, before-after.

Rapid revision

  • Quantitative + normal + 2 independent groups → unpaired t-test; if skewed → Mann–Whitney U.
  • Quantitative + normal + 2 paired groups → paired t-test; if skewed → Wilcoxon signed-rank.
  • Quantitative + ≥3 independent groups → ANOVA; if non-normal → Kruskal–Wallis.
  • ≥3 paired/repeated measures → repeated-measures ANOVA / Friedman test.
  • Categorical + large sample → chi-square; expected cell <5 → Fisher's exact.
  • Paired/matched categorical → McNemar's test.
  • Two continuous normal variables → Pearson; ordinal/non-normal → Spearman.
  • Non-parametric tests use ranks and medians, are robust to outliers, lower power.
  • Multiple t-tests for ≥3 groups inflate Type I error — use ANOVA.
  • Small sample, unknown population SD → t-test; large sample or known SD → Z-test.
  • Survival curves → log-rank; multivariable survival → Cox regression (hazard ratio).
  • Inter-observer agreement (categorical) → kappa statistic; binary outcome prediction → logistic regression (odds ratio).