AT

Study Designs in Epidemiology

Community Medicine · Epidemiology · lean revision notes

Study Designs in Epidemiology

Epidemiological study designs are the scaffolding on which all evidence about disease causation, prevention, and treatment is built. For NEET PG, the recurring trick is to give you a research scenario and ask which design fits, which measure of association applies, and which bias threatens it. Master the logic below and these become free marks.

Broad Classification

Studies are first split by whether the investigator intervenes.

Category Investigator role Examples
Observational – Descriptive Only describes (no comparison group) Case report, case series, cross-sectional (descriptive), ecological
Observational – Analytical Observes with a comparison group Cross-sectional (analytical), case-control, cohort
Experimental (Interventional) Actively assigns exposure/treatment RCT, field trial, community trial

A second axis is direction:

  • Exposure → Disease (forward): cohort, experimental
  • Disease → Exposure (backward): case-control
  • Exposure and Disease measured simultaneously: cross-sectional

High-yield: The single best discriminator between cohort and case-control is the starting point. Cohort starts with exposure status (exposed vs non-exposed) and looks forward to disease. Case-control starts with disease status (cases vs controls) and looks backward to exposure.

Hierarchy of Evidence

A classic exam stem asks "which gives the strongest causal evidence?"

Case report/series → Ecological → Cross-sectional → Case-control → Cohort → RCT → Systematic review/Meta-analysis (top)

High-yield: Meta-analysis of RCTs sits at the apex of the evidence pyramid; case reports at the base. The RCT is the single strongest individual study design for establishing causation because randomisation controls confounding (known and unknown).


1. Descriptive Studies

Case report and case series

A detailed description of one (report) or several (series) patients. No comparison group, so no rate or association can be computed. They are hypothesis-generating and are often the first signal of a new disease (e.g., the original cluster of Pneumocystis pneumonia that flagged AIDS).

Ecological (correlational) study

The unit of analysis is a population/group, not the individual — e.g., correlating per-capita fat intake with breast-cancer mortality across countries.

  • Measure used: correlation coefficient (r).
  • Strength: cheap, fast, uses existing aggregate data; good for generating hypotheses and for exposures with little individual variation (e.g., air pollution).
  • Fatal flaw: the ecological fallacy — an association at group level need not hold at the individual level.

High-yield: Ecological fallacy = drawing individual-level conclusions from group-level data. This is the most tested single fact about ecological studies.


2. Cross-sectional Study (Prevalence Study)

Exposure and outcome are measured at the same point in time ("snapshot"). It is the standard design for surveys (e.g., National Family Health Survey).

  • Measure of frequency: Prevalence (this is the only design that directly gives prevalence).
  • Measure of association (analytical type): Prevalence ratio / Prevalence Odds Ratio.
  • Strengths: quick, inexpensive, good for planning health services and estimating disease burden; can study several exposures and outcomes at once.
  • Limitations: cannot establish temporality (chicken-or-egg), so causation cannot be inferred; underestimates short-duration diseases; subject to Neyman (prevalence–incidence) bias.

High-yield: Cross-sectional study measures prevalence, NOT incidence. It is best for chronic diseases (long duration) and useless for rapidly fatal or short illnesses.


3. Case-Control Study

Starts with diseased (cases) and non-diseased (controls), then looks backward to compare past exposure. Direction is outcome → exposure.

Design essentials

  • Best for: rare diseases and diseases with long latency (e.g., cancers).
  • Measure of association: Odds Ratio (OR) — incidence/relative risk cannot be directly calculated because the investigator fixes the number of cases and controls (the denominator population is unknown).
  • Allows study of multiple exposures for a single disease.

2×2 table and OR

Cases (D+) Controls (D−)
Exposed (E+) a b
Non-exposed (E−) c d

Odds Ratio = (a × d) / (b × c) — the cross-product ratio.

High-yield: When the disease is rare (<10%), the OR approximates the Relative Risk (rare disease assumption). This is a perennial favourite.

Strengths

Cheap, quick, small sample needed, ideal for rare disease, multiple exposures, no loss to follow-up (retrospective).

Limitations & characteristic biases

  • Cannot calculate incidence or RR.
  • Recall bias — cases remember exposures better than controls (the signature bias of case-control studies).
  • Selection bias, especially Berksonian bias when both cases and controls are drawn from hospital populations.
  • Temporal sequence can be uncertain.

High-yield: Recall bias is the hallmark bias of case-control studies; Berkson bias is the selection bias arising from hospital-based controls.


4. Cohort Study (Follow-up / Longitudinal / Incidence Study)

Starts with exposure status in disease-free individuals, then follows forward to compare disease incidence in exposed vs non-exposed. Direction is exposure → outcome.

Types

Type Timing Example
Prospective Cohort assembled now, followed into the future Framingham Heart Study
Retrospective (historical) Exposure already occurred; outcomes traced from old records Occupational cohort using past employment records
Ambispective Both retrospective + prospective components

Measures of association

Disease + Disease − Total
Exposed a b a+b
Non-exposed c d c+d
  • Incidence in exposed = a/(a+b); incidence in non-exposed = c/(c+d).
  • Relative Risk (RR) = [a/(a+b)] / [c/(c+d)] — strength of association.
  • Attributable Risk (AR) = Ie − Io — excess risk due to exposure (public-health/preventive importance).
  • Attributable Risk % = (Ie − Io)/Ie × 100.
  • Population Attributable Risk (PAR) = risk in total population attributable to the exposure — guides community-level intervention priorities.

High-yield: RR measures the strength of association (etiological force); AR measures the public-health impact / benefit of removing the exposure. Cohort studies are the only observational design that directly yields incidence and RR.

Strengths

Establishes temporality (strongest observational evidence for causation), calculates incidence/RR/AR, good for rare exposures, can study multiple outcomes of one exposure, minimal recall bias.

Limitations

Expensive, time-consuming, large sample, loss to follow-up (major threat — the characteristic problem), not suitable for rare diseases, prospective ones take years.

High-yield: Cohort = best for rare exposure and multiple outcomes; Case-control = best for rare disease and multiple exposures. This mirror-image pair is asked almost every year.


5. Experimental Studies

The investigator allocates the exposure/intervention. The gold standard is the RCT.

Randomised Controlled Trial (RCT)

  • Randomisation balances confounders (known and unknown) between arms — the unique strength.
  • Blinding (single/double/triple) minimises observer and information bias.
  • Allocation concealment prevents selection bias at the point of enrolment (distinct from blinding).
  • Intention-to-treat (ITT) analysis — analyse participants in the group to which they were randomised, regardless of compliance; preserves randomisation and prevents attrition/compliance bias. Per-protocol analysis (only completers) overestimates efficacy.

High-yield: Allocation concealment ≠ blinding. Concealment happens before/at randomisation (hiding the next assignment); blinding happens after allocation (hiding which arm a subject is in). ITT is the analysis of choice to maintain the benefits of randomisation.

Other trial types

Trial Unit randomised Example
Clinical trial Individual patients New drug vs placebo
Field trial Healthy individuals Vaccine trial in the community
Community trial Whole communities/groups Water fluoridation, salt iodisation

Trial measures

  • Relative Risk Reduction (RRR) = 1 − RR.
  • Absolute Risk Reduction (ARR) = control event rate − treatment event rate.
  • Number Needed to Treat (NNT) = 1/ARR — patients treated to prevent one adverse outcome (lower = better drug).
  • Number Needed to Harm (NNH) = 1/ARI.

High-yield: NNT = 1/ARR. A small NNT means a more effective intervention. NNT uses ARR, never RRR.


Choosing the Right Design — A Decision Flow

Is there an intervention by the investigator?Yes → Experimental (RCT/field/community trial). → No → observational, so next ask:

Is there a comparison group?No → Descriptive (case report/series, ecological, descriptive cross-sectional). → Yes → analytical, so ask what is the starting point?

  1. Start with prevalence/snapshotCross-sectional.
  2. Start with disease (rare disease, long latency, multiple exposures, quick/cheap) → Case-control → OR.
  3. Start with exposure (rare exposure, multiple outcomes, need incidence/temporality) → Cohort → RR, AR.

High-yield: First-occurrence/new outbreak of unknown cause → case-control is usually the first analytical design (rapid, cheap). To confirm causation → cohort, then RCT if ethical.


Measures of Association — Master Table

Design Frequency measure Association measure Causation (temporality)?
Ecological — (aggregate rates) Correlation (r) No (ecological fallacy)
Cross-sectional Prevalence Prevalence ratio / OR No
Case-control Odds Ratio Weak/uncertain
Cohort Incidence RR, AR, PAR Yes (strong)
RCT Incidence RR, RRR, ARR, NNT Strongest

High-yield: OR is the only valid association measure in case-control; RR/AR require a cohort or RCT; prevalence is unique to cross-sectional.


Bias by Design (very frequently tested)

Bias Definition Classic design affected
Recall bias Differential recall of exposure Case-control
Berksonian bias Hospital cases & controls differ from community Case-control (hospital-based)
Selection bias Non-representative sampling Any, esp. case-control
Neyman (prevalence-incidence) bias Missing fatal/cured cases Cross-sectional
Lead-time bias / Length bias Apparent survival gain from earlier/indolent detection Screening evaluation
Healthy worker effect Workers healthier than general population Occupational cohort
Confounding Third variable distorts exposure–outcome link All observational; controlled by randomisation in RCT
Hawthorne effect Subjects change behaviour because observed Trials/observational
Loss to follow-up (attrition) Differential dropout Cohort, RCT

High-yield: Randomisation is the only technique that controls unknown/unmeasured confounders. Matching, stratification, restriction, and multivariable adjustment handle only known confounders.


Establishing Causation — Bradford Hill Criteria

When an association is found, judge causation using Bradford Hill's criteria: Temporality (the only essential/mandatory one), Strength, Dose–response (biological gradient), Consistency, Specificity, Biological plausibility, Coherence, Experimental evidence, Analogy.

Mnemonic: "Tempting Strong Dose Could Cause Some Bad Effects, Anyway" (Temporality, Strength, Dose-response, Consistency, Coherence, Specificity, Biological plausibility, Experiment, Analogy).

High-yield: Temporality (cause precedes effect) is the only indispensable Hill criterion. Cohort and RCT satisfy it; cross-sectional and ecological do not.


Key Differentials — Don't Confuse These

  • Incidence vs Prevalence: Incidence = new cases (cohort/RCT); Prevalence = existing cases (cross-sectional). Prevalence = Incidence × Duration.
  • RR vs OR: RR is a true risk ratio (cohort/RCT); OR is an odds ratio (case-control); OR ≈ RR only when disease is rare.
  • RR vs AR: RR = strength of association; AR = excess risk attributable to exposure (preventive value).
  • Field trial vs Community trial: Field = healthy individuals (vaccine); Community = whole groups (fluoridation).
  • Allocation concealment vs Blinding: before vs after randomisation.

Recently asked / exam angle

  • "A study begins with lung-cancer patients and healthy controls and asks about past smoking" → case-control; measure = Odds Ratio; cannot calculate RR.
  • "Per-capita sugar intake correlated with diabetes mortality across 30 countries" → ecological study; pitfall = ecological fallacy.
  • "Framingham study type" → prospective cohort, yields incidence and RR.
  • "Which design best for a rare disease?" → case-control; for a rare exposurecohort.
  • "Best design to estimate disease burden/prevalence for health planning" → cross-sectional survey.
  • "ARR = 0.02, find NNT" → NNT = 1/0.02 = 50.
  • "Which analysis preserves randomisation despite dropouts?" → Intention-to-treat.
  • "Only method controlling unknown confounders" → randomisation.
  • "Which is the only mandatory Bradford Hill criterion?" → Temporality.
  • "Hospital-based cases and controls give which bias?" → Berksonian bias.

Rapid revision

  1. Cohort starts with exposure → forward → gives incidence, RR, AR; best for rare exposure & multiple outcomes.
  2. Case-control starts with disease → backward → gives OR only; best for rare disease & multiple exposures; main bias = recall bias.
  3. OR ≈ RR when disease is rare (rare-disease assumption).
  4. Cross-sectional gives prevalence, never incidence; cannot prove causation; suited to chronic disease; risk of Neyman bias.
  5. Ecological study unit = population; pitfall = ecological fallacy.
  6. RR = strength of association; AR = preventive/public-health impact; PAR guides community intervention.
  7. NNT = 1/ARR; lower NNT = better treatment; never use RRR for NNT.
  8. Randomisation alone controls unknown confounders; ITT analysis preserves it.
  9. Allocation concealment (pre-randomisation) ≠ blinding (post-allocation).
  10. Evidence pyramid apex = meta-analysis of RCTs; base = case report.
  11. Temporality is the only obligatory Bradford Hill criterion.
  12. Prevalence = Incidence × Duration.