• A/A test shows significant results

    Table of contents

    How to recognize the error

    Your A/A test shows a significant result for one or more metrics after a few days. 

    How to fix the error

    An A/A test is used to check the reliability of the test setup. Two identical variants are tested against each other. Theoretically, both variants should achieve the same results. If an A/A test shows a significant result usually indicates that not on a real effect, but on Statistical or technical causes there.

    1. importance of significance in the A/A test

    Varify evaluates test results with a frequentist approach.
    When the Coincidence largely ruled out the reciprocal of the p-value as Significance displayed.
    If this value is above 95 %, Varify shows a significant result .

    In an A/A test, however, this does not mean that one variant is actually „better“ - but that the data is randomly distributed in such a way that the probability of a difference appears greater than it actually is.


    2. common causes for a „false“ significant result

    a) Too short a term
    With too little data, chance can still have a strong effect. A significant short-term deflection is normal and not a reliable signal.

    b) Too many goals
    Each additional target increases the probability of a so-called Alpha error. This means that the chance that a difference will be found somewhere by chance increases with the number of metrics.

    c) Uneven traffic allocation
    If visitors are not evenly distributed across the variants (e.g. through caching, bot traffic or incomplete playout), the result can be distorted.

    d) Sample too small
    Metrics fluctuate greatly with low conversion figures. A difference of just a few conversions can lead to a seemingly high significance.


    3. best practice for A/A tests

    To ensure that A/A tests provide reliable results, we recommend

    • Running time: at least 10 days

    • Amount of data: at least 500 conversions per variant

    • Goals: maximum 3 Metrics, with a focus on the main KPIs

    • Ignore intermediate results: Significance values may fluctuate during the term. Only the Final result at the end of the test is meaningful.

    This ensures that the influence of chance remains low and the evaluation delivers realistic results.


    4. conclusion

    A significant result in an A/A test means in most cases No real signal, but is limited to Random or test configuration attributable.
    Only when sufficient data has been collected over a longer period of time and chance can be statistically excluded is a result really reliable.

  • First steps