Sampling Methods & Sample Size

Community Medicine · Biostatistics · lean revision notes

Sampling Methods & Sample Size

Sampling is the process of selecting a representative subset (the sample) from a larger population so that conclusions drawn from the sample can be generalised back to the whole. This topic is a perennial NEET PG favourite in Biostatistics: questions are almost always scenario-based — "which sampling method is this?" or "which method gives the least sampling error?" — so the trick is pattern recognition, not rote memory.

Why we sample — basic definitions

A population is the entire group about whom we want information (e.g., all diabetics in a district). A sample is the part of the population actually studied. We sample because studying everyone (a census) is costly, slow, and often impossible.

Sampling unit — the basic element selected (a person, household, village, school).
Sampling frame — the complete list of all sampling units from which the sample is drawn (e.g., electoral roll, hospital register).
Sampling fraction — sample size ÷ population size (n/N).
Sampling error — the difference between the sample estimate and the true population value that arises purely by chance because we studied only a sample. It is reduced by larger samples and by probability methods. It cannot be eliminated but can be quantified.
Non-sampling error — error from faulty measurement, bias, non-response, recall, or data entry. It is not reduced by increasing sample size and is often the larger problem in real studies.

High-yield: Sampling error decreases as sample size increases; non-sampling (systematic) error does NOT decrease with sample size. A census has zero sampling error but can still have huge non-sampling error.

Classification of sampling methods

Sampling methods split into two big families.

Feature	Probability (random) sampling	Non-probability sampling
Selection	Every unit has a known, non-zero chance	Selection by convenience/judgement
Sampling error	Can be calculated	Cannot be calculated
Generalisability	High (representative)	Low (prone to bias)
Examples	Simple random, systematic, stratified, cluster, multistage	Convenience, purposive, quota, snowball
Use	Surveys, RCTs, prevalence studies	Qualitative, pilot, hard-to-reach groups

High-yield: Only probability sampling allows you to estimate sampling error and apply tests of significance. If a question says "sampling error can be measured," it is a probability method.

Probability sampling methods

1. Simple random sampling (SRS)

Every unit has an equal and independent chance of selection. Done using a lottery method, random number tables, or computer-generated random numbers.

Needs: a complete sampling frame (list of all units).
Advantage: least biased; the theoretical "gold standard" of representativeness.
Limitation: impractical for large/dispersed populations; needs a full frame; may by chance miss small subgroups.

High-yield: When choices include SRS and the population is small with a complete list available, SRS introduces the least sampling error / least bias. It is the reference standard against which other methods are judged.

2. Systematic sampling

Select every kth unit after a random start. Sampling interval k = N/n (population size ÷ desired sample size).

Flow: Determine N and n → compute k = N/n → pick a random start between 1 and k → then select every kth unit.

Example: N = 1000, n = 100, so k = 10. Random start = 7 → select 7, 17, 27, 37…
Advantage: simple, fast, spreads sample evenly; does not need the full list in advance (can pick "every 10th patient as they arrive").
Limitation: periodicity bias — if the list has a hidden cyclical pattern matching the interval k, the sample becomes biased (e.g., every 7th house being a corner house, or every 10th patient being admitted on a particular weekday).

High-yield: The major pitfall of systematic sampling is periodicity in the sampling frame. Otherwise it is nearly as good as SRS.

3. Stratified sampling

The population is first divided into homogeneous, mutually exclusive strata (e.g., by age, sex, socio-economic class, urban/rural). A random sample is then drawn from each stratum.

Proportionate stratified: sample from each stratum ∝ its size in the population.
Disproportionate stratified: smaller/important strata are over-sampled to ensure adequate representation.
Advantage: ensures representation of every subgroup; gives the most precise estimate (least sampling error) when the population is heterogeneous between strata but homogeneous within; allows separate estimates per stratum.
Limitation: requires prior knowledge of strata; more administrative work.

High-yield: When the population is heterogeneous (e.g., different income groups, age bands), stratified random sampling gives the most representative sample / least sampling error. This is one of the single most-tested facts on this topic.

4. Cluster sampling

The population is divided into naturally occurring clusters (e.g., villages, schools, urban wards). A random sample of whole clusters is selected, and all units within selected clusters are studied.

Advantage: no need for a complete list of individuals — only a list of clusters; cheap and feasible for large, geographically scattered populations.
Limitation: highest sampling error of the probability methods, because units within a cluster tend to be similar (low within-cluster variability). Requires a larger sample for the same precision.

High-yield: Among probability methods, cluster sampling has the LARGEST sampling error (least precise) but is the most economical for widely dispersed populations. The classic WHO EPI 30×7 cluster sampling for immunisation coverage uses cluster sampling.

5. Multistage sampling

Sampling carried out in successive stages, each stage sampling from the units selected in the previous stage. Used in large national surveys (e.g., NFHS — National Family Health Survey).

Flow: State → select districts → within districts select villages/blocks → within villages select households → within households select individuals.

Advantage: practical for very large populations; needs frames only for selected higher-stage units.
Limitation: error accumulates at each stage; statistically less precise than SRS.

Multiphase sampling (do not confuse): information is collected from the whole sample in the first phase, then more detailed info from a subsample in later phases (e.g., screen everyone for symptoms, then do chest X-ray only on symptomatic). Same units, different amount of information per phase.

Method	Frame needed	Cost	Sampling error	Best when
Simple random	Full list of all units	High effort	Low	Small population, list available
Systematic	Partial / sequential	Low	Low (unless periodicity)	Ordered list, quick fieldwork
Stratified	List + strata info	Moderate	Lowest	Heterogeneous population
Cluster	List of clusters only	Lowest	Highest	Large dispersed population
Multistage	Frames at each stage	Moderate	Moderate–high	National/large surveys

Non-probability sampling methods

Convenience sampling — subjects chosen because they are easy to access (e.g., patients walking into OPD). Cheapest, most biased.
Purposive (judgemental) sampling — the investigator deliberately selects subjects believed to be representative or information-rich (e.g., choosing "typical" villages, or key informants). Common in qualitative research.
Quota sampling — set quotas for subgroups (e.g., 50 men, 50 women) then fill them by convenience. The non-random analogue of stratified sampling.
Snowball sampling — existing subjects recruit further subjects; used for hidden/hard-to-reach populations (IV drug users, sex workers, undocumented migrants).

High-yield: A scenario describing recruitment of injecting drug users where one participant refers the next = snowball sampling. Deliberately picking "representative" or "expert" units = purposive sampling.

Scenario decoding — the exam skill

NEET PG rewards matching the story to the method:

"Every 5th patient attending OPD" → systematic.
"Population divided into income groups, then random sample from each" → stratified.
"20 villages selected at random and every household in them surveyed" → cluster.
"Districts → villages → households → individuals chosen in steps" → multistage.
"Names drawn by lottery / random number table" → simple random.
"Researcher picks villages he feels are typical" → purposive.
"Patients available in the ward today" → convenience.

Mnemonic for the five probability methods — "Some Students Study Cluster Maps": Simple random, Systematic, Stratified, Cluster, Multistage.

Sample size determination

Choosing how many subjects to enrol is critical: too few → study underpowered, misses a real effect; too many → wasteful, unethical. Sample size depends on the type of study.

Estimating a proportion (prevalence survey)

The classic formula:

n = Z²pq / d² (or 4pq/L² when Z ≈ 2 at 95% confidence)

Where:

Z = standard normal deviate for the chosen confidence (1.96 for 95%, ≈ 2; 2.58 for 99%).
p = expected/anticipated prevalence (proportion); q = 1 − p.
d (or L) = absolute allowable error / desired precision.

High-yield: In the formula n = 4pq/L², the 4 comes from Z² at 95% confidence (1.96² ≈ 4). This is a repeatedly asked one-liner.

Worked example: expected prevalence p = 20% (0.2), q = 0.8, allowable error d = 5% (0.05), 95% CI. n = 4 × 0.2 × 0.8 / (0.05)² = 0.64 / 0.0025 = 256.

High-yield: Maximum required sample size for a proportion occurs at p = 0.5 (because pq is maximal at 50%). When prevalence is unknown, assume p = 50% to be safe — this gives the largest, most conservative n.

Estimating a mean

n = Z²σ² / d² — needs the population standard deviation (σ) (from pilot or literature) and the allowable error d. Greater variability (larger σ) demands a larger sample.

Comparing two groups (RCT / analytic study)

Sample size then also depends on:

Power (1 − β): the probability of detecting a true difference; conventionally 80% or 90%. Higher power → larger n.
Type I error (α / significance level): usually 0.05; a smaller α (more stringent) → larger n.
Effect size: the magnitude of difference you want to detect; smaller effect size → much larger n.

Factors affecting sample size (summary table)

Factor	Effect on required sample size
↑ Desired precision (smaller allowable error d)	↑ Sample size (n ∝ 1/d²)
↑ Confidence level (95% → 99%)	↑ Sample size
↑ Power (80% → 90%)	↑ Sample size
↓ Significance level α (0.05 → 0.01)	↑ Sample size
↑ Variability (σ) / prevalence near 50%	↑ Sample size
Smaller effect size to detect	↑ Sample size
Anticipated dropout / non-response	↑ Sample size (inflate accordingly)

High-yield: Sample size is inversely proportional to the square of the allowable error (n ∝ 1/d²). Halving the permissible error quadruples the required sample size — a classic conceptual MCQ.

High-yield: Always inflate the calculated n for expected non-response/dropout. If expected dropout is 10%, divide n by 0.9 (i.e., n_adjusted = n / (1 − dropout fraction)).

Finite population correction

For small populations, the calculated n is reduced using n_adj = n / (1 + n/N). With very large N, correction is negligible.

Type I vs Type II error — quick anchor (links to sample size)

	Reality: H₀ true	Reality: H₀ false
Reject H₀	Type I error (α) — false positive	Correct (power, 1−β)
Accept H₀	Correct	Type II error (β) — false negative

Increasing sample size reduces β / increases power without changing α, which is why adequate sample size is the chief safeguard against a false-negative study.

Errors and bias — distinctions tested

Sampling error ↓ with larger n; random; quantifiable in probability sampling.
Selection bias — systematic; arises from how subjects are chosen (e.g., convenience sampling, volunteer bias); not fixed by larger n.
Non-response bias — those who respond differ systematically from those who do not.

High-yield: A bigger sample reduces sampling error but does NOT correct bias. A biased method (e.g., convenience sampling) with a huge sample is still biased — the famous Literary Digest 1936 poll failure is the textbook illustration.

Complications / pitfalls of poor sampling

Under-coverage: sampling frame misses parts of the population (e.g., telephone surveys miss those without phones).
Periodicity bias in systematic sampling.
Design effect in cluster sampling — variance is inflated relative to SRS; the design effect (DEFF) > 1 means cluster surveys need a larger sample to match SRS precision.
Volunteer/self-selection bias in non-probability methods.
Non-response lowering effective sample and biasing results.

Recently asked / exam angle

Scenario MCQs are the dominant format: a sampling procedure is described and you must name the method (systematic vs stratified vs cluster vs multistage). Master the seven decode patterns above.
"Which method gives the least sampling error in a heterogeneous population?" → Stratified random.
"Which probability method has the maximum sampling error?" → Cluster sampling.
"Sampling interval in systematic sampling = ?" → N/n.
In n = 4pq/L², what does 4 represent? → Z² ≈ (1.96)² at 95% confidence.
At what prevalence is required sample size maximum? → p = 50%.
"Increasing sample size reduces which error?" → Sampling error / Type II error (β), NOT bias and NOT α.
WHO immunisation coverage survey uses → 30-cluster (30×7) sampling.
NFHS uses → multistage stratified sampling.
Non-probability method for hidden populations (drug users) → snowball.
"Effect of halving allowable error on n?" → n becomes 4× (quadruples).

Rapid revision

Probability sampling = known chance of selection → sampling error can be calculated; non-probability cannot.
Simple random sampling = lottery/random numbers; least biased reference standard; needs full frame.
Systematic sampling interval k = N/n, random start 1–k; main flaw = periodicity.
Stratified sampling → divide into homogeneous strata → least sampling error in heterogeneous populations (most precise).
Cluster sampling → study all units in randomly chosen clusters → largest sampling error, but cheapest for dispersed populations.
Multistage = sampling in stages (NFHS); multiphase = more detail on a subsample of the same units.
Sample size for proportion: n = 4pq/L²; the 4 = Z² at 95% CI; n ∝ 1/L².
Maximum n is needed when p = 50% (pq maximal); assume p = 0.5 if prevalence unknown.
Sample size ↑ with greater precision, higher power, lower α, larger variability, smaller effect size.
Larger sample reduces sampling error and β (raises power) but does NOT remove bias or change α.
Always inflate n for non-response/dropout: n_adj = n / (1 − dropout fraction).
Snowball = hidden populations; purposive = expert/judgement selection; quota = non-random analogue of stratified.

← Back to hub Practice MCQs →