Just like you, I hear the words “A/B Test” and it gets my blood pumping. Thinking about a well formed hypothesis, ensuring our KPIs are measured and observable, then conducting tests to try and influence them, gives any marketer reason to fly out of bed in the morning.
If you work anywhere near a marketing or CRO team, you know just how important the role an A/B test plays in a marketers toolkit. It gives us a robust understanding of our processes, systems, customers, marketing channels, websites. It’s foundational to continuous improvement, innovation and observing & influencing the world around us. If you’re wondering how to go about A/B testing, you can find our Complete Guide to A/B Testing here.
There’s something we don’t tend talk about, though… It’s a bit of a dirty secret. Something that, when considered, could potentially table that next A/B test that you were thinking about conducting. It isn’t hard to see, either. It’s right there in plain sight.
A/B Testing isn’t always financially viable and it doesn’t always make sense…
Calculating statistical significance:
First things first, you’re probably not a statistician, are you? Neither am I but I’m going to give you what you need to know to run a statistically significant test before jumping into the A/B testing theory.
Statistical significance isn’t explicitly a quantifiable metric in most cases. It really refers to a result that you can confidently say did not happen by chance. Think about flipping a coin 5 times. You’re more likely to get heads 3 times than you are if you were to flip the coin 500 times.
The Litmus Test [D+T=R+]
Okay so now we have the statistical significance jargon out of the way, let’s take a look at what A/B testing really is. A/B testing is the act of splitting something into 2 variations in order to test which variation is more effective. Subject headings, button colours, product images… Any entity within a process or user journey can be altered to test improvements or prove/disprove a theory with an A/B test.
A/B testing is a critical part of Conversion Rate Optimisation. It gives us the chance to improve our systems with relatively low risk, clear and actionable data and can often prove results in a very short amount of time. How short you ask? Well, as any marketer worth their weight in salt would say: It depends. Ugh… Do you hate that answer as much as I do? Good, keep reading…
You see, when deploying A/B tests to prove a hypothesis, testing against large data sets will save you from a world of pain. Data sets large enough to prove statistical significance, given as much time as possible (within reason) or at least a reasonably defined timeframe for your project. All too often, marketers and CROs fall short of one or, in some cases, both, of the volumes required for these key components to provide a clear result. Sometimes, we may tip the balance. Gather more data over less time or give less data more time, but ultimately, if either are not significant, how can our results be? So, a quick equation to a successful test:
Healthy Data (D) + Healthy Time (T) = Healthy Test Results. (R+)
Let’s talk about the (D)
Let’s address critical component #1 of the equation. The data. Your data… Critical to conducting a successful test, your data needs to be clean, trustworthy, influenceable and shareable. If any of these things are out of balance, conducting A/B tests will result in a waste of time and resources for you and your organisation.
Let’s take a simple CRO task for an example. You’ve read an article that green buttons are better than blue buttons (eeeeeewwwww, blue buttons?) So you decide to conduct an A/B test on your site to see if changing the colour of your buttons from blue to green improves clicks on CTAs on your site… Are you even tracking clicks on those CTAs? How long have you been tracking the clicks for? Where are you logging this data? Is it consistent, trustworthy and free from bugs or even spam bots?
In the planning phase of your next test, ask yourself the following questions:
- What data am I trying to influence?
- Can I trust this data?
- Does it have the same integrity as the author of this article?
- Can I influence this data with a test?
- Right or wrong, how do I share my results?
- What would be statistically significant to a data set of this size?
A/B testing, especially that used for CRO, has its roots set in incremental improvements. Slight changes that slowly, but gradually, improve your overall results. So expecting a 25% uplift in conversions from a test is highly unlikely. You’re far more likely to see measures somewhere in +/-5%.
You’ll find that, by answering the questions above (most of the time) you will hit a snag, because most of the time, data isn’t as clean or trustworthy as it should be. It takes a lot of time, effort, team power and resources to achieve the level of data maturity required for accurate testing.
Never fear, if you do hit a snag, simply scope the work required to bring them up to the level of sophistication required to run a test. By the time you get there, your test might not even be as important as you think… You may even learn a thing or two.
If you’ve cleared all of the data-centric questions, don’t get too excited, we’re not out of the woods yet.
(T) Time… (hat, rabbit, clock, lol)
Critical component #2 in the equation is time. As with any project, it’s important to set the parameters in which your test will be conducted. Essential to this is understanding the rate in which you are collecting the data you are trying to influence. Understanding this will also help you in understand what a healthy timeline for your project is.
As an example, let’s say you’re collecting, on average, 20 email leads per day from your an on site pop-up. That’s 140 leads per week. Roughly 610 per month. Regardless of the nature of your test conditions, how long will it take for you to show at least a 2,4,5% uplift? How are you going to be splitting your testing groups? 50/50? 75/25? I can see your eyes glazing over, but hear me out with a little more maths:
Let’s say you go with a 50/50 split. You’re testing for statistical significance against 305 leads per month, expecting a lift of up to 15 additional leads per month from your focus set (Your version B). We’re assuming your control remains steady. Is 15 leads enough for you to make the call? Is it statistically significant? Remember, statistical significance refers to the likelihood that a variable was not caused by random chance…
So, lets address our next set of questions:
- What is the rate at which I am gathering the data required for this test?
- What influence should I expect to have on the data on any given day?
- How many days will it take for me to prove that results aren’t just chance?
- How often will I be checking in on the results of this test?
If you have made it here unscathed, congratulations! You’re ready to run an exciting test. Go forth, split! lol
If any of the above can’t be answered, or can, but throw a spanner in the works, it begs the question…
Is an A/B test really worth it?
In most cases, no. Because they aren’t done right! Far too often we use A/B testing when really, we aren’t giving it the time and effort it needs to make it meaningful. As marketers, we need to be a little more critical of the tactics we deploy in order to move the needle on our KPIs. After all, open rate, conversion rate, click through rate. All of them sit atop ROI. Return on investment, return on time, possibly the most critical business metric of all. The metric that should be first consideration when deploying an A/B test.
Cool, but what’s the alternative?
Well here’s where it gets exciting… Before reaching the level of sophistication required for successful testing, you should first take a critical view of whatever it is you are trying to improve. Ask yourself the following questions:
- Is the system we’re trying to improve up to standard? What does best practice look like?
- What are my peers doing that I could learn from or replicate?
- Does this just require a hot fix? (Fixing an issue quickly and without testing)
- Is it worth surveying my customers about this? What would they say?
- How important is it to uplift results? Does leadership have buy in to give us the time we need to adjust our practices in order to make them more meaningful?
So, in summary…
Don’t get me wrong… I wake up every day craving a good experiment. But in that lies the problem. Too often we overlook the very logical reasons to not test, for the thrill of feeling like Walter White in Breaking Bad. So, the next time the discussion of an A/B test comes up, get excited, for sure, but throw some common sense at it.