Check if your A/B test results are statistically significant
A/B Test Significance Calculator
Check whether your A/B test results are statistically significant.
Control (A)
Conversion rate: —
Variation (B)
Conversion rate: —
Enter your data
We'll tell you if your variation beats control.
Achieved confidence
—
P-value
—
Relative lift
—
Control rate
—
Variation rate
—
Z-score
—
FAQ
Frequently asked questions
How do I know if my A/B test is statistically significant?
Enter your visitors and conversions for control and variation. The calculator returns the p-value and achieved confidence level. If achieved confidence is at or above your threshold (typically 95%), the result is significant.
What confidence level should I use?
95% is the industry standard. Use 99% for high-stakes decisions (homepage rewrites, pricing changes). Use 90% only for exploratory tests where false positives are cheap.
Why is my A/B test not significant?
Usually because you haven't collected enough data, or because the real lift is smaller than what you sized the test to detect. Run the sample size calculator with your current baseline and a smaller MDE to see how much more data you'd need.
Should I use one-sided or two-sided?
Two-sided. It's the safe default and protects you from missing regressions. One-sided is only valid in narrow cases where a worse-than-control result would be ignored.
What is a p-value in A/B testing?
The p-value is the probability that the observed difference between control and variation happened by random chance. A p-value of 0.03 means a 3% chance the lift is a fluke. Below 0.05 is the standard threshold for significance.
My test is significant after 2 days. Can I stop?
No. Early significance is unreliable. Stopping the moment your test crosses the significance threshold inflates your false positive rate. Run until you hit your pre-calculated sample size and at least 7 days.
What's a good Z-score for A/B testing?
At 95% two-sided confidence, you need a Z-score above 1.96 (or below -1.96) for significance. At 99%, the threshold is 2.58. The higher the absolute Z-score, the stronger the evidence.
Can I compare more than two variations?
This calculator handles A/B (two variations). For A/B/n tests, run each variation against the control as a separate comparison and apply a Bonferroni correction by dividing your significance threshold by the number of comparisons.
My variation is significantly worse than control. What do I do?
Don't ship it. Negative results are still useful: you learned what doesn't work. Roll back to control and use the learning to inform your next hypothesis.