Popular keywords

Support article

Total Results

No Record Found

How Varify calculates the significance

In short

The article explains how Varify evaluates test results with a frequentist method using one-sided chi-square and t-tests to detect significance more quickly. It also explains why longer run times and few metrics - especially for A/A tests - are important to avoid wrong decisions by chance.

Calculation of significance in app.varify.io

Varify uses a statistical algorithm to evaluate test results. frequentist method. It calculates how likely it is that a difference between the variant and the original has arisen by chance. If chance can be ruled out as far as possible, Varify displays the reciprocal of the calculated p-value the so-called Significance. Is this greater than 95 %, the result is displayed in the tool as , displayed.

In order to evaluate the results as efficiently as possible, Varify uses two established statistical tests:

For binomial targets (e.g. click rate, conversion rate), a one-sided chi-square test used.
For sales or value metrics (e.g. Average Order Value, Revenue per Visitor), a one-sided student t-test used.

These one-sided tests are deliberately chosen. They provide faster results, because they calculate less conservatively than two-sided methods. This allows customers to see earlier whether one variant is likely to perform better.

Of course, there is also a downside to this: if a test runs very short or many metrics at the same time are evaluated, the chance of a so-called False positive - i.e. a result that appears to be significant, although in reality it was only a coincidence.

Therefore:
👉 Better to test a little longer, so that the results stabilize and you can judge more reliably whether a variant is really better.

Especially for A/A tests it is important to only Few destinations to add. Through the so-called Alpha error accumulation each additional metric increases the probability that a False positive is created - in other words, a supposed winner who is not really a winner.

Best practices for A/A testing:

Running time: at least 10 days
At least 500 conversions per variant
Maximum 3 destinations with a focus on the Main KPI
Significance values occurring in between should be ignored The end result is what counts. This is the only way to keep the false positive rate low and the results truly reliable.

Own calculation of significance using a significance calculator

Check your A/B test results for significant differences. For this purpose Varify.io provides a significance calculator.

Go to significance calculator

Popular keywords

Support article

How Varify calculates the significance

Table of contents

In short

Calculation of significance in app.varify.io

Own calculation of significance using a significance calculator

Platform

Knowledge

Compare

Support

Company

More about us