Sample Size & Statistical Power Suite

Estimate required sample size, achieved power, and practical recruitment targets across seven study modes with transparent assumptions and formula guidance.

Last Updated: March 2026

Select Study Mode

%

Use 50% when uncertain for conservative planning.

%
%
%

Final recommendation is adjusted upward for expected attrition.

Formula Preview

n = (Z^2 x p x (1 - p)) / E^2

This suite uses educational approximations. Real-world studies can require additional adjustments for clustering, repeated measures, multiplicity, and outcome distribution shape.

Assumption Checklist

  • Effect assumptions should be realistic, not optimistic.
  • Alpha controls false-positive risk; power controls false-negative risk.
  • Dropout/nonresponse should be applied before recruitment planning.
  • Statistical significance is not the same as practical importance.
Choose a mode and enter assumptions to estimate required sample size, adjusted recruitment needs, and achieved power where relevant.

Educational Use Disclaimer

This tool provides educational estimates only. It is not legal, regulatory, or medical advice, and it is not a substitute for protocol-level statistical review. Clustered data, repeated measures, stratification, multiple testing, dropout behavior, and non-normal outcomes can materially change required sample size.

How This Calculator Works

This suite starts by normalizing inputs for the selected mode. Percentage-based fields are converted to internal rates where needed, while mean-based precision fields remain in their native outcome units. The calculator then validates ranges, checks for contradictory assumptions, and applies a mode-specific formula.

In estimation modes, the output centers on required sample size for the target confidence and precision. In comparison modes, the output focuses on detectable difference planning under alpha and power assumptions. For achieved-power mode, the suite estimates what your existing sample can detect under the assumptions you provide.

Dropout or nonresponse adjustments are applied after base sample-size estimation. Final recruitment recommendations are rounded upward so practical planning does not rely on fractional participants. Where group comparisons are used, the calculator returns per-group and total requirements.

Outputs include formula summaries, assumption notes, warnings for fragile scenarios, and quick sensitivity hints. This design helps beginners get a usable estimate while still exposing the core assumptions that analysts, students, and researchers must evaluate before execution.

What You Need to Know

What is sample size, and why does it matter?

Sample size is the number of observations you plan to collect before analysis. It controls how much random noise affects your estimate and how likely your study is to detect a real signal. Too small, and your study can miss meaningful effects. Too large, and you may spend unnecessary time and budget for marginal precision gains. Good planning balances statistical reliability, practical cost, and decision speed.

Underpowered studies often produce unstable conclusions, wide confidence intervals, and inconsistent replication. Overpowered studies can detect tiny effects that are statistically significant but operationally unimportant. A well-designed study should target an effect that is both detectable and decision-relevant.

What is statistical power?

Statistical power is the probability of detecting an effect when that effect truly exists at the size you specified. Common planning targets are 80% and 90%. If power is 80%, your design still carries a 20% Type II error risk under the assumed effect.

Power depends on effect size, sample size, variance, alpha level, and test directionality. A tiny effect needs a larger sample. A noisy outcome needs a larger sample. A stricter alpha level also increases required sample for the same power target.

Alpha, confidence, and significance in plain language

Alpha is the false-positive risk tolerance in hypothesis testing. Confidence level is the companion concept used in interval estimation. For example, 95% confidence corresponds to alpha 0.05 in many common two-sided settings. They are related, but they answer different questions.

A result can be statistically significant but practically minor. Decision quality improves when you evaluate both statistical detectability and practical effect magnitude. If you need quick lift math while planning assumptions, pair this workflow with the Percentage Calculator.

Mode guide: which calculator mode should you use?

ModeBest ForPrimary Inputs
Single ProportionEstimate a rate or percentage with a chosen confidence and margin of error.Expected proportion, confidence level, margin of error, optional population size.
Two-Proportion / A/B TestDetect a difference between control and variant conversion rates.Baseline rate, variant rate or lift, alpha, power, allocation ratio.
Single MeanEstimate a continuous mean with target precision.Standard deviation, confidence level, margin of error in outcome units.
Two-Mean ComparisonDetect a mean difference between independent groups.Mean difference, standard deviation, alpha, power, allocation ratio.
Survey Sample SizeApply finite population correction when population size is known.Population size, confidence level, margin of error, expected proportion.
PrevalenceEstimate prevalence with optional design-effect expansion for complex sampling.Expected prevalence, confidence, precision, design effect, nonresponse.
Power CalculatorEstimate achieved power from an existing sample size and assumptions.Sample size, alpha, study type assumptions, allocation ratio.

Start with the mode that matches your endpoint type: proportions for yes/no outcomes, means for continuous outcomes, and achieved power when sample size is already fixed. Use survey or prevalence modes when finite-population context or design-effect adjustments are central.

Effect size and minimum detectable effect (MDE)

Effect size is the signal you want to detect. In conversion testing, this may be an absolute percentage-point lift or a relative lift. In mean-based studies, effect size is often the minimum meaningful difference in outcome units. The MDE is the smallest effect likely to be detected with your chosen alpha, power, and sample size.

Tiny effects can be real but expensive to detect. Before choosing a small effect target, ask if that change would alter product, policy, or research decisions. If not, a larger, practical target may lead to faster and more useful study cycles.

Formulas and assumptions

The suite uses standard educational formulas and normal-approximation methods for broad accessibility. These methods are appropriate for many planning tasks, but assumptions matter: independence, stable baseline rates, and reasonably well-behaved outcome distributions are all important.

DesignFormula SnapshotCore Assumptions
Single proportionn = (Z^2 x p x (1 - p)) / E^2Normal approximation; binary outcome; simple random sampling.
Survey with finite populationn = n0 / (1 + (n0 - 1)/N)Useful when sampling fraction is meaningful and population size N is known.
Single meann = (Z x sigma / E)^2Outcome measured on a continuous scale with assumed standard deviation.
Two-proportion / A/BNormal approximation with Zalpha, Zbeta, pooled and alternative variance termsIndependent groups and stable conversion assumptions.
Two-mean comparisonnA = ((Zalpha + Zbeta)^2 x sigma^2 x (1 + 1/r)) / Delta^2Independent groups, approximate normality, common variance assumption.
Prevalence with design effectn = (Z^2 x p x (1 - p) / d^2) x DEFFClustered or design-complex samples often need DEFF > 1 and nonresponse adjustment.

These formulas are intentionally transparent so you can inspect sensitivity to assumptions. For example, proportion formulas are most conservative at p = 0.5, while mean-based formulas scale with squared variability. Doubling standard deviation can quadruple required sample.

Worked examples (plain-language walkthroughs)

ScenarioInputsInterpretation
Survey proportion95% confidence, 5% margin, p = 50%Conservative plan near n = 385 before nonresponse and FPC adjustments.
A/B test10% baseline vs 12% variant, alpha 0.05, power 80%, 1:1 splitReturns per-group and total sample size needed to detect a 2-point lift.
Mean estimationSD = 12, margin = 3, confidence 95%Precision target gives an educational estimate near n = 62 before dropout inflation.
PrevalenceExpected prevalence 18%, precision 3%, confidence 95%, DEFF 1.5Design effect raises required sample before nonresponse adjustment.

Example 1 (survey): with 95% confidence, 5% margin of error, and p = 50%, the classic planning result is near 385 before nonresponse adjustment. This is why 50% is commonly used when prior prevalence is uncertain.

Example 2 (A/B): a 10% baseline and 12% variant imply a 2-point absolute lift. At alpha 0.05 and power 80%, the tool returns per-group sample estimates, total N, and adjusted recruitment after expected exclusions.

Example 3 (single mean): if SD is 12 and target precision is plus/minus 3 at 95% confidence, required n is around 62 before attrition adjustment. If variability rises to 18, required n increases substantially.

Example 4 (prevalence with design effect): clustered sampling with DEFF 1.5 inflates sample requirements versus simple random sampling. This is a common reason field studies need larger recruitment than classroom formulas suggest.

Common planning mistakes and how to avoid them

MistakeWhy It MattersWhat To Do Instead
Using unrealistic effect sizesOverly optimistic effects shrink n and increase underpower risk.Anchor assumptions to pilot data, historical baselines, or practical decision thresholds.
Forgetting dropout/nonresponseFinal analyzable sample becomes smaller than planned.Inflate recruitment with explicit dropout adjustment before launch.
Mixing confidence and powerDesign targets become inconsistent or incorrectly interpreted.Treat confidence/alpha and power as different controls serving different goals.
Ignoring multiple testingFalse-positive risk can rise above nominal alpha.Use corrected alpha or formal multiplicity planning for multi-metric studies.
Skipping design effectClustered sampling can need materially larger n.Apply design effect for clustered or complex survey designs.
Treating significance as business valueTiny effects can be significant but not meaningful.Pair statistical detectability with practical impact thresholds.

One frequent error is assuming significance automatically means practical value. Another is skipping dropout adjustments and then discovering the final analyzable sample is smaller than required. Build realistic assumptions up front and revisit them after pilot data.

When this calculator is not enough

Some designs require specialized methods beyond compact educational calculators. If your study uses clustering, repeated measures, survival outcomes, adaptive stopping, noninferiority margins, or multiple-primary endpoints, protocol-specific analysis is usually necessary.

ScenarioWhy Advanced Review Is Needed
Clustered trials or cluster surveysIntraclass correlation and design effect change effective sample size.
Repeated measures or longitudinal studiesWithin-subject correlation needs specialized power methods.
Survival/time-to-event endpointsEvent counts and censoring drive power, not just raw sample size.
Noninferiority/equivalence studiesMargins and directional hypotheses need protocol-specific formulas.
Adaptive or sequential experimentsInterim looks require alpha spending and adjusted boundaries.
Skewed/heavy-tailed outcomesRobust or transformation-based methods may be required.
Multivariate or multiple-primary endpointsJoint error control can increase required sample.
Regulated clinical protocolsIndependent statistical review is expected before study approval.

For high-stakes or regulated projects, use this page as a starting estimate and then involve a qualified statistician for full protocol power analysis and documentation.

Further reading and next study-planning steps

  • Confidence intervals and uncertainty interpretation.
  • Hypothesis testing workflow and error-rate control.
  • Effect-size selection tied to practical decision thresholds.
  • Survey design basics: nonresponse, weighting, and frame quality.
  • A/B test interpretation with guardrail metrics and seasonality checks.
  • Prevalence studies with design effect and sampling frame bias review.
  • Power analysis fundamentals for continuous and binary outcomes.

For quick supporting math during planning, use the Percentage Calculator and timeline checks with the Date Duration Calculator. You can browse the full Statistics Calculators hub for additional tools as this section expands.

Final takeaway

Sample-size planning is a decision framework, not a single magic number. Use this suite to compare assumptions, understand tradeoffs, and produce transparent planning estimates. Then validate assumptions with domain context, pilot data, and expert review when stakes are high.

Frequently Asked Questions

A sample size calculator estimates how many observations you need to reach a chosen precision or detection goal under specific assumptions such as confidence level, effect size, alpha, and power.

Start with confidence level, margin of error, and expected response distribution. If population size is known, apply finite population correction to reduce the sample needed for smaller populations.

Statistical power is the probability that your study detects a true effect of the size you care about. A common target is 80%, meaning a 20% Type II error risk under your assumed effect.

You need baseline conversion, target variant conversion (or detectable lift), alpha, power, and allocation ratio. Smaller effects require larger samples, and imbalance between groups can increase total sample needs.

Confidence level is tied to estimation uncertainty or alpha control, while power is tied to detecting a true effect in hypothesis testing. They solve different design questions and should not be treated as interchangeable.

Use an effect size that is both realistic and practically meaningful for decisions. Overly optimistic effects underestimate required sample size and can leave studies underpowered.

For proportion formulas, p(1-p) is largest at p = 0.5. That yields the most conservative estimate when you do not have reliable prior proportion information.

Margin of error is the maximum tolerated estimation error around a point estimate at a chosen confidence level. Tighter margins need larger samples.

Use expected prevalence, confidence level, desired precision, and optionally design effect for clustered sampling. Add nonresponse adjustment if some observations are expected to be missing.

FPC adjusts sample size downward when sampling a meaningful fraction of a finite population. It is most relevant when population size is known and not extremely large.

You can use it for educational planning, but regulated or high-stakes studies usually require protocol-specific modeling by qualified statisticians, including dropout, multiplicity, endpoint definitions, and design structure.

Consult a statistician for clustered data, repeated measures, adaptive designs, survival outcomes, noninferiority/equivalence questions, multiple endpoints, or when decisions carry legal, regulatory, or clinical consequences.

Related Calculators

Sources & References

  1. 1.CDC Epi Info User Guide: StatCalc(Accessed March 2026)
  2. 2.OpenEpi Sample Size and Power Resources(Accessed March 2026)
  3. 3.statsmodels NormalIndPower Documentation(Accessed March 2026)
  4. 4.statsmodels TTestIndPower Documentation(Accessed March 2026)
  5. 5.NIST/SEMATECH e-Handbook of Statistical Methods(Accessed March 2026)
  6. 6.Cochran WG. Sampling Techniques (reference text)(Accessed March 2026)