Bias, Confounding & Causation
Community Medicine · Epidemiology · lean revision notes
Bias, Confounding & Causation
Epidemiological studies aim to estimate the true association between an exposure and an outcome. Bias and confounding are the two great enemies of that truth, while the Bradford Hill criteria help us decide when an observed association is actually causal. This is a perennial NEET PG favourite, almost always asked as a clinical/research vignette demanding you name the specific error operating.
1. The big picture: why associations can be false
When a study reports an association, there are four possible explanations. You must rule out the first three before claiming causation.
Observed association → Is it chance? → Is it bias? → Is it confounding? → Only then: causation.
| Explanation | Nature | Controlled by |
|---|---|---|
| Chance (random error) | Random, due to sampling | Larger sample size, statistical testing (p-value, CI) |
| Bias (systematic error) | Systematic, built into design/conduct | Good study design; cannot be fixed in analysis |
| Confounding | Systematic, due to a third variable | Design or analysis stage |
| True causation | Real biological relationship | Establish via Bradford Hill |
High-yield: Random error → affects precision (corrected by ↑ sample size). Systematic error (bias) → affects validity/accuracy (cannot be corrected by increasing sample size). This precision-vs-validity distinction is heavily tested.
2. Bias — classification
Bias = any systematic error in design, conduct or analysis of a study that results in a mistaken estimate of the exposure–outcome association. Broadly, three families:
- Selection bias — error in selecting/retaining study subjects.
- Information (measurement/observation) bias — error in measuring exposure or outcome.
- Confounding — traditionally listed separately (a distortion by a third factor), discussed below.
2A. Selection bias
Arises when the subjects studied are not representative of the target population, or when comparison groups differ systematically in ways related to both exposure and outcome.
| Type | Setting | Classic description |
|---|---|---|
| Berkson's bias (admission rate bias) | Hospital-based case-control | Cases & controls selected from hospital; differential admission rates create spurious association between two unrelated diseases |
| Neyman bias (prevalence–incidence / survival bias) | Case-control using prevalent (survivor) cases | Rapidly fatal or quickly-cured cases are missed; only survivors studied → distorted exposure |
| Healthy worker effect | Occupational cohorts | Workers healthier than general population (sick people don't get/keep jobs) → underestimates harm |
| Non-response / volunteer bias | Surveys, screening | Responders/volunteers differ from non-responders |
| Loss to follow-up (attrition) bias | Cohort/RCT | Dropouts differ from those retained |
| Ascertainment / detection bias | Screening, surveillance | Exposed group monitored more closely → more disease detected |
High-yield: Berkson's = two hospitalised groups, spurious association. Neyman = prevalent/surviving cases, the rapidly fatal ones are lost. These two are the most commonly confused — Berkson is about admission, Neyman is about survival.
2B. Information (measurement) bias
Arises from incorrect measurement of exposure or outcome.
| Type | Description | Example |
|---|---|---|
| Recall bias | Cases recall past exposure better than controls | Mothers of malformed babies recall drug intake more |
| Interviewer / observer bias | Interviewer probes exposed/cases differently | Knowing case status, interviewer asks leading questions |
| Reporting bias | Subjects under/over-report sensitive info | Under-reporting alcohol, smoking, sexual history |
| Misclassification bias | Subjects placed in wrong exposure/outcome category | Faulty diagnostic test labels diseased as healthy |
| Hawthorne effect | Subjects change behaviour because observed | Hygiene improves once participants know they are watched |
| Lead-time bias | Screening detects disease earlier → apparent ↑ survival without true benefit | Survival "from diagnosis" longer though death date unchanged |
| Length bias | Screening preferentially detects slow-growing (indolent) disease | Indolent cancers over-represented in screen-detected group |
High-yield: Recall bias is the classic limitation of case-control studies. Lead-time and length bias are the classic limitations of screening programmes — distinguish them: lead-time = earlier detection clock-starting; length = slow tumours preferentially caught.
Misclassification — differential vs non-differential
- Non-differential (random) misclassification — error equal across groups → biases estimate towards the null (dilutes a real effect). Tested fact.
- Differential misclassification — error differs by group (e.g., recall bias) → can bias in either direction (towards or away from null).
High-yield: Non-differential misclassification → almost always biases towards the null (RR/OR pushed towards 1). This is a frequent one-liner MCQ.
3. Confounding
A confounder is a third variable that is associated with the exposure and is an independent risk factor for the outcome, and is not an intermediate step in the causal pathway.
Three criteria for a confounder (all must be met):
- Associated with the exposure (in the source population).
- Independent risk factor for the outcome.
- Not an intermediate variable on the causal pathway between exposure and outcome.
Classic example: Coffee–lung cancer association is confounded by smoking (coffee drinkers smoke more; smoking causes lung cancer). Once you adjust for smoking, the coffee association vanishes.
High-yield: Age and smoking are the two most common confounders in NEET PG vignettes. If a question shows an association that disappears after "adjustment," the third variable is a confounder.
Confounding vs effect modification (interaction)
| Feature | Confounding | Effect modification (interaction) |
|---|---|---|
| What it is | A nuisance to be removed | A real biological phenomenon to be described |
| Effect on estimate | Distorts the true measure | Different effect size across strata |
| Stratified analysis | Adjusted (pooled) estimate ≈ across strata; differs from crude | Stratum-specific estimates differ from each other |
| Action | Control/remove it | Report it stratum-wise (don't "remove") |
High-yield: In effect modification, stratum-specific RRs differ from each other. In confounding, stratum-specific RRs are similar to each other but differ from the crude estimate.
Methods to control confounding
Design stage (before/during data collection):
- Randomisation — gold standard; distributes known and unknown confounders equally (only in RCTs).
- Restriction — limit study to one category (e.g., only non-smokers); reduces generalisability.
- Matching — pair cases and controls on confounder (age, sex). Needs matched analysis (McNemar / conditional logistic regression).
Analysis stage (after data collected): 4. Stratification — analyse within strata; use Mantel–Haenszel to pool adjusted estimate. 5. Multivariate analysis — e.g., multiple logistic regression, Cox proportional hazards; adjusts for several confounders simultaneously. 6. Standardisation — direct/indirect, classically for age.
Flow — controlling confounding: Design phase → Randomisation / Restriction / Matching → Analysis phase → Stratification (Mantel–Haenszel) / Multivariate regression / Standardisation.
High-yield: Randomisation is the only method that controls unknown/unmeasured confounders — hence the RCT's supremacy. Restriction, matching, stratification and regression can only address known/measured confounders. This is the single most repeated fact in this topic.
Mnemonic for confounder control — "RM-SMS": Restriction, Matching, Stratification, Multivariate, Standardisation (+ Randomisation as the design overlord).
4. Validity & reliability (linked concept)
Often paired with bias in MCQs.
- Validity (accuracy) = measures what it intends to; threatened by bias (systematic error).
- Internal validity — results true for the study population.
- External validity (generalisability) — results applicable to wider population.
- Reliability (precision/repeatability) = consistency on repetition; threatened by random error.
High-yield: A study can be highly reliable but invalid (consistently wrong — e.g., a miscalibrated weighing scale). Validity ≠ reliability. Restriction improves internal validity but reduces external validity.
5. Bradford Hill criteria for causation
After ruling out chance, bias and confounding, Sir Austin Bradford Hill (1965) proposed nine viewpoints to judge causation. None is individually sufficient; temporality is the only absolute requirement.
| # | Criterion | Meaning |
|---|---|---|
| 1 | Temporality | Cause must precede effect (only essential criterion) |
| 2 | Strength | Larger RR/OR → more likely causal |
| 3 | Consistency | Repeatable across studies, populations, settings |
| 4 | Biological gradient | Dose–response relationship |
| 5 | Specificity | One cause → one effect (weakest, often violated) |
| 6 | Biological plausibility | Consistent with known biology |
| 7 | Coherence | Doesn't conflict with natural history/known facts |
| 8 | Experiment / reversibility | Removing exposure ↓ disease |
| 9 | Analogy | Similar agents cause similar effects |
High-yield: Temporality is the sine qua non — the only criterion that MUST be satisfied. Strength and biological gradient (dose-response) are strong supporters. Specificity is the weakest/least useful (most diseases are multifactorial; smoking causes many diseases).
Mnemonic — "Timmy's Strong Consistent Biological Specific Plausible Coherent Experiment Analogy" or simply remember the lead trio: Temporality (must), Strength, Dose-response.
High-yield: A cohort study and an RCT can establish temporality (exposure measured before outcome). A case-control study cannot reliably establish temporality — a key reason it's weaker for causation.
6. Worked vignette logic (how MCQs phrase it)
- "Cases and controls both drawn from a hospital, spurious link found" → Berkson's bias.
- "Only surviving patients of MI studied; fatal cases missed" → Neyman (survival) bias.
- "Mothers of children with birth defects recall drug use better" → Recall bias.
- "Screen-detected cancers appear to survive longer though death timing unchanged" → Lead-time bias.
- "Screening picks up mostly slow-growing tumours" → Length bias.
- "Factory workers healthier than general public" → Healthy worker effect.
- "Association of coffee with CHD disappears after adjusting for smoking" → Confounding (smoking).
- "Effect of drug differs in diabetics vs non-diabetics" → Effect modification.
- "Behaviour improved because subjects knew they were observed" → Hawthorne effect.
7. Complications / consequences of unaddressed errors
- Spurious associations leading to wrong public-health policy.
- Masking of true effects (non-differential misclassification dilutes real RR towards null).
- Misleading screening "benefit" (lead-time/length bias inflating apparent survival).
- Non-reproducible research and wasted resources.
- Harm to patients if causal claims are acted on prematurely.
8. Key differentials / discriminators (don't confuse these)
| Pair | Key discriminator |
|---|---|
| Selection vs information bias | Who you pick vs how you measure |
| Berkson vs Neyman | Admission rate vs survival |
| Confounding vs effect modification | Remove it vs report it (strata similar-to-crude vs strata-differ) |
| Lead-time vs length bias | Earlier detection vs indolent-tumour over-detection |
| Random vs systematic error | Precision/sample-size vs validity/design |
| Validity vs reliability | Accuracy vs repeatability |
Recently asked / exam angle
- Identify the specific bias in a described case-control study (Berkson's, Neyman, recall) — the single most common stem.
- Only criterion essential for causation → Temporality (repeated almost every cycle).
- Best/only method to control unknown confounders → Randomisation.
- Non-differential misclassification biases towards → the null.
- Direction of bias from random vs differential misclassification.
- Weakest Bradford Hill criterion → Specificity.
- Distinguishing confounding from effect modification using stratified RR tables.
- Healthy worker effect in occupational epidemiology stems.
- Lead-time vs length bias in screening-programme questions.
- Mantel–Haenszel as the technique for pooled stratum-adjusted estimate.
- Which study design controls confounding at the design vs analysis stage.
Rapid revision
- Random error → precision (fix with sample size); systematic error/bias → validity (fix with design, NOT sample size).
- Berkson's bias = both groups from hospital → spurious association (admission rate bias).
- Neyman bias = prevalence-incidence/survival bias; fatal cases missed in case-control.
- Recall bias = classic flaw of case-control studies.
- Lead-time bias (earlier detection) and length bias (indolent tumours) plague screening.
- Healthy worker effect underestimates occupational harm.
- A confounder must satisfy 3 criteria: linked to exposure, independent risk factor for outcome, NOT on causal pathway.
- Non-differential misclassification → bias towards the null (dilutes true effect).
- Randomisation is the ONLY method controlling unknown confounders.
- Restriction/matching = design stage; stratification (Mantel–Haenszel) / multivariate regression = analysis stage.
- Effect modification → stratum-specific RRs differ from each other (report, don't remove).
- Temporality is the ONLY essential Bradford Hill criterion; specificity is the weakest; strength + dose-response are strong supporters.