We’ve all heard advertising claims backed by sketchy statistics, like “9 out of 10 doctors prefer” this product or that. After hearing so many of these claims we suspect that the reality of the test was, “we gave 20 chosen doctors a choice between our product and another, we picked the 9 that chose our product, and threw in one that picked the other, and now we’re hoping you believe that 90% of doctors prefer our product.”
If you want a good A/B test, then make sure it is designed to be statistically relevant. Setup your test based on the right statistical foundation, and you can rely on the results. Ignore the statistic foundation, and you may as well guess. A solid foundation of statistical relevance involves several factors, but its cornerstone is the p-value.
In plain English, a p-value of .047 is saying is that there is a 4.7% chance that you could have obtained these results by random chance. There is a 4.7% chance that if you were to run this experiment again you would not see the same result.
A key here is to not consider the test a failure if the results are inconclusive (p-value is greater than .05). Knowing that changes to certain email content or timing won’t likely have an effect on your audience is just as useful for future communication strategies. If you still feel strongly that the first experiment wasn’t enough to capture the difference in your group’s responses, then replicate the experiment to add to the strength of your results.
Once you know your p-value, you need to design your test correctly. Here are a few rules of thumb that can help:
Choose the right sample size
Don’t change too many variables at once
Know exactly what you want to test before starting