Question 1

What's a 'Wilson score' confidence interval?

Accepted Answer

It's a more accurate confidence interval for proportions than the textbook formula. The textbook (Wald) interval is p ± 1.96·sqrt(p(1-p)/n), which breaks down at small n and at extreme rates (e.g. 0/100 gives '0% ± 0%' — meaningless). Wilson handles small samples and extreme proportions correctly. It's been the recommended default in modern statistics textbooks since around 2000 and is what statsmodels, R, and most production analytics tools use.

Question 2

Why does the CI matter?

Accepted Answer

Because conversion rates are estimates with uncertainty, not facts. '5% conversion rate from 100 visitors' could plausibly be anywhere from 2% to 11% in reality. Treating the point estimate as the truth is how teams ship A/B tests that revert two weeks later when they sample more users.

Question 3

How big does my sample need to be?

Accepted Answer

Depends on the precision you need and the rate itself. For ±2 percentage points at 95% confidence with worst-case rate (50%), you need ~2400 trials. For ±5 pp, ~385. For ±0.5 pp, ~38,400. The 'How many visitors do I need' section computes this directly.

Question 4

How does the A/B test mode decide significance?

Accepted Answer

It runs a pooled two-proportion z-test: pool the conversion rate across both variants (p̂ = (convA + convB) / (nA + nB)), compute z = (pB − pA) / √(p̂(1−p̂)(1/nA + 1/nB)), and convert |z| to a two-tailed p-value. If p < 0.05, the badge reads 'statistically significant' — meaning a difference at least this large would appear less than 5% of the time if the variants truly converted identically. Example: A 50/1000 vs B 65/1000 is a +30% relative uplift, but z ≈ 1.44 and p ≈ 0.15 — not significant yet.

Question 5

My uplift is +30% — why isn't it significant?

Accepted Answer

Because the uncertainty around each rate is bigger than the gap between them. Relative uplift is a point estimate; significance asks whether the gap could plausibly be noise given your sample sizes. Small samples and low base rates need surprisingly large n: detecting a real 10–20% relative lift on a 5% base rate typically takes thousands of visitors per variant. Keep the test running — and decide the sample size before peeking, or you'll inflate your false-positive rate.

Question 6

What do revenue per visitor and revenue per conversion tell me?

Accepted Answer

Enter the revenue those visitors generated (single mode) and you get revenue / visitor (RPV — the value of traffic, useful for valuing ad clicks) and revenue / conversion (average order value). RPV = conversion rate × revenue per conversion, so it moves when either lever moves — which is why a variant can win on conversion rate and still lose on revenue.

Conversion Rate Calculator

Variant A

Variant B

Why the CI matters

The "rule of 30" doesn't apply here

A/B tests: uplift is not evidence

Related

FAQ