A/A test shows significant results
Table of contents
How to recognize the error
Your A/A test shows a significant result for one or more metrics after a few days.
How to fix the error
An A/A test is used to check the reliability of the test setup. Two identical variants are tested against each other. Theoretically, both variants should achieve the same results. If an A/A test shows a significant result usually indicates that not on a real effect, but on Statistical or technical causes there.
1. importance of significance in the A/A test
Varify evaluates test results with a frequentist approach.
When the Coincidence largely ruled out the reciprocal of the p-value as Significance displayed.
If this value is above 95 %, Varify shows a significant result .
In an A/A test, however, this does not mean that one variant is actually „better“ - but that the data is randomly distributed in such a way that the probability of a difference appears greater than it actually is.
2. common causes for a „false“ significant result
a) Too short a term
With too little data, chance can still have a strong effect. A significant short-term deflection is normal and not a reliable signal.
b) Too many goals
Each additional target increases the probability of a so-called Alpha error. This means that the chance that a difference will be found somewhere by chance increases with the number of metrics.
c) Uneven traffic allocation
If visitors are not evenly distributed across the variants (e.g. through caching, bot traffic or incomplete playout), the result can be distorted.
d) Sample too small
Metrics fluctuate greatly with low conversion figures. A difference of just a few conversions can lead to a seemingly high significance.
3. best practice for A/A tests
To ensure that A/A tests provide reliable results, we recommend
Running time: at least 10 days
Amount of data: at least 500 conversions per variant
Goals: maximum 3 Metrics, with a focus on the main KPIs
Ignore intermediate results: Significance values may fluctuate during the term. Only the Final result at the end of the test is meaningful.
This ensures that the influence of chance remains low and the evaluation delivers realistic results.
4. conclusion
A significant result in an A/A test means in most cases No real signal, but is limited to Random or test configuration attributable.
Only when sufficient data has been collected over a longer period of time and chance can be statistically excluded is a result really reliable.