A/B Testing Tips and Common Mistakes

What is A/B testing and why is it important?

A/B testing is an optimisation strategy, where variants of a marketing campaign or a webpage are tested against one another to determine which is the highest-performing version.

In an A/B test, you create two (or more) versions of the same campaign or page, where they both near-identical bar one single element.

It’s important because it’s a repeatable, data-driven process that leads to incremental improvements. 

You’ll also see it referred to as Split-Testing (as the audience is being split in two, each seeing a different version) and A/B/n (which really just means there are more than two versions being tested).

Some terms to know

Variants

Email A and Email B are both referred to as the variants. When naming your variants in a test, you typically refer to them in line with what the difference between them is. e.g. “personalised” and “non-personalised”.

Variable

This is the name given to the actual element being tested, whether that’s the CTA, layout, image and so on.

Control group

This is a percentage of your potential audience who will not receive the campaign at all. That might seem odd, but when testing the performance of variants on a conversion rate, for example, it’s important that you know whether or not you’d be better off not sending a campaign in the first place!

The control group winning an A/B test happens more often than you might think.

How to run a successful A/B test

Develop a hypothesis

What is your hypothesis for this test? It will be informed by what you’re testing (a landing page, an email, a push notification etc) and what you’re hoping to achieve (encourage sign-ups, drive sales, increase website traffic etc)

Your hypothesis is essentially a hunch, or educated guess, that you are looking to validate through the results of the test. 

For example, you have a feeling that including a 🚀 emoji in your email’s subject line will lead to more opens.

So you create two variants of the email, one with the emoji and one without, and let your email audience tell you if your theory is right or wrong.

Typically, all hypotheses are variations on the following:

    • Will my campaign perform better in the morning or later in the day?
    • Will including the recipient’s name in the copy lead to more engagement?
    • Would an image with the text be more impactful than the text by itself?
    • Is a green or a red CTA button more likely to get a higher click-through rate?

We’ll look at this in more detail later, but essentially you’re always testing either a creative element (image, copy, format etc) or a timing element (morning or evening, real-time or with a delay etc)

Define success and choose a metric

What is the metric you’ll use to compare the variants? It’s essential that you set a clear and measurable goal, otherwise there is no test!

It’s better if the metric is an easily quantifiable action, either performed or not performed immediately. So, for example you might pick one of the following:

    • Open rate
    • Click-through rate

But it may be more appropriate to choose a revenue-related metric like a purchase. Our tip here would be to consider putting a timeframe around conversion goals. So, you’re only attributing the goal to the campaign if it’s within an hour (just for example) of it being sent.

Why? So that you are certain that it actually was this particular campaign that spurred the conversion and not something else (a display ad they saw a few hours later or another email from you that finally got around to opening).

Of course, for your brand and the product or service you offer, you may decide that the timing of the conversion doesn’t matter. The only question you’re interested in answering if Group A or Group B had more.

Set a time limit and make sure you have the numbers

You can’t have a test running indefinitely, eventually, you have to bring it to an end and analyse the data. How long should you run an A/B test?

There’s no right or wrong answer because actually the more important factor is how wide a data-set you have to assess. 

Because for a test to be accurate it needs a large sample size. If after your week-long landing page test you’ve only had a handful of visits across the variants then don’t read too much into the results.

Accept the result

This can be one of the toughest stages in a test; accepting that your hunch, which you may still feel strongly about, was wrong! It can be tempting to make excuses, or even run the test again, but don’t. If you set the test up correctly the first time around then chances are the results will be the same no matter how many times you run it.

Test another element

If the campaign or landing page you tested is evergreen, or at least will be running for a while longer, then test some other aspect. 

Satisfied you’ve got the perfect CTA? Then test the length of the copy. The subject line is working well? Look at adding a GIF to the body of the message.

Common mistakes when A/B testing

No matter what you are testing, there are three potential pitfalls and mistakes that everyone has made at some point.

Too many variables

The biggest one, by far, is testing too much at once. The variants in an A/B test should be more or less identical.

If you change both the CTA and the layout of your campaign message then how can you know for certain which one is the decisive factor?

Now, if it’s a one-off campaign then you may care too much about what caused the success, you’re just glad that there’s a winner. But it’s about gathering some learnings for future campaigns. 

And also, how can you be sure that a third variant, where only the CTA was changed might not have performed even better again? That’s why it’s important to test one thing at time and improve it incrementally.

Not enough differentiation

And the second common error we see is that there’s not enough of a difference between the versions of the element being tested.

A simple example of this is testing two images which are only marginally different, or changing an inconsequential word in the copy (“really” vs “very”). It’s unlikely that there is going to be any significant statistical difference between the two variants. In other words, it’s a waste of time!

Uneven quality

And lastly, it’s not ensuring that both variants are actually equally as good! Or at least, that they both have the potential to perform well.

This isn’t always straightforward by the way. If you have a bias towards the variant you believe is best then there’s a chance the other variant you create to test it against isn’t going to have as much effort put into it.

And of course, if one of the variants is just plain wrong or bad then it’s never going to be the winner! And you’re not going to learn much about the result.

In some ways, what you’re doing with an A/B test is pitting two competing philosophies or approaches against each other. It’s really saying, “ I think a hard sell at the end will perform better for us than a softly-softly ask”. 

What should you be A/B testing?

Alright, with all of that said, what are the types of elements that people typically test? Let’s break it down into four sections; creative, timing, channel and promotion.

Creative

The creative assets of a campaign or webpage is the most obvious place to start.  So that’s looking at the use of emojis, GIFs and images or not, injecting some form of dynamic content into the copy to personalise or keeping it generic, or going with a quirky tagline versus playing it safe.

More broadly speaking then, it’s also common to test the length of the copy used, or the layout it’s presented in. Will a shorter landing page where the user doesn’t even need to scroll out-perform our longer, more in-depth version? That’s the ideal question for an A/B test to answer.

Timing

There are some instances where the actual timing of the campaign is far and away the most important element, and the biggest contributing factor to its success or failure.

The best example is an abandoned cart recovery message. Is it better to send it 24 hours after the customer has left the checkout, or within 15 minutes? 

This is a challenging use case because depending on the circumstances in which the cart has been abandoned, either could be right.

It underlines an important thing to keep in mind when testing; you are never going to find the outright, perfect variant that works for every visitor and customers. All you can do is determine the one that is statistically most likely to get the job done.

Channel

In the era of true multichannel marketing, this is becoming a more common type of test. For example, does a promotion lead to more conversions if sent via a one-off email or an SMS?

Or to think about our cart recovery campaign again, is a direct push notification more effective than an on-site message when the visitor returns?

Promotion

And lastly then, is a test of different promotional offers. Assuming that you’ve gotten your calculations right and the margins are roughly the same, it comes down to what is psychology more appealing to your audience.

So does a 5% discount appeal to more customers than free shipping? Is a straightforward £20-off more enticing than a 2-for-1.

What are the most important elements to test in an email campaign

Subject line

47% of people decide to open an email, or not, based on the subject line. We’ve mentioned some classic ideas for testing like using emojis or personalisation. They are popular in tests for a reason, as very often they do lead to improvements.

Beyond those however, we also recommend you experiment with bolder choices in your copy and even using capital letters to emphasize certain keywords. There’s a reason why email marketers write MAJOR DISCOUNTS and 50% OFF.

But the usual rules of thumb around good taste and your own brand guidelines obviously apply here. What’s right for a low-end brand may be entirely inappropriate for a luxury goods brand. And vice versa by the way!

From sender name

Instead of having the email appear in the customer’s inbox “from Brand A”, test how it performs coming from an actual employee. “Joe from Brand A” may just appeal that little bit more.

Day of the week

If you want to find evidence online that any one of the seven days of the week perform best then there’s no shortage of case studies and reports all with different opinions.

The only way you’ll know if your promotional email is better at the start of the week or the end is by…yes, testing it.

Layout and style

Image above the text, side by side? Huge feature image at the top, or straight in with “hello”? A GIF from The Office or one from Parks and Recreation?

And what about your colour palette? To be honest, there are so many possible ideas and combinations of ideas that it’s easy to be overwhelmed. Our advice is just to make some decisions and not overthink too much. Within a couple of tests, you’ll have found a strong template that works for you.

What’s a multivariate test?

A multivariate, or multiple variant, test is essentially an A/B test where multiple elements are being tested at once.

Early, we said that it’s best practice to test one thing at a time so why would you run a multivariate test and is it a good idea?

If you know that you want to test multiple elements of a webpage and you simply can’t, or won’t, wait to run one test at time then a multivariate test might be the right option. However, this does mean that you are going to need more variants in order to make sure you covered all of the possible combinations.

For the mathematically-minded, the formula is below:

# variables of element 1 (X) # variables of element 2 (=) # of variants required

In other words, if you want to test two versions of the image and two versions of the CTA then you need four variants altogether. And if there were three versions of each then that’s nine.

But the biggest concern here is not the workload but that by splitting your audience four ways (or five including the control group) you won’t get data on each variant to reliably choose the winner. 

Often what happens is that a brand will start their multivariate test only to realise that they’ve stretched their audience too thin and it takes twice or three longs longer to get the data needed.

But if you are confident that you can reach critical mass quick enough then go for it.

One-off A/B test vs testing an ongoing campaign

Amongst our own clients, who have the ability to run comprehensive A/B tests from our multichannel engagement platform, there are essentially two broad camps; one-off tests for a single, one-time only campaign and continuous testing of ongoing campaigns.

Both are equally valid.

However, we recommend that the learnings from a one-off test should be documented somewhere and referred back to when creating a similar campaign. So, by and large, does including the recipient’s name work well for us? Do we see better results from campaigns sent before lunch or after? 

Over time, you can build up a solid bank of data around what typically works best for your brand’s audience across different channels. Ideally, with each new campaign you are starting from a stronger, more assured place.

However, in our experience the impact of A/B testing  is typically best felt when optimising ongoing, year-round campaigns and journeys. The classic example here is an onboarding campaign or welcome message. 

These are going to be shown to most, if not all, of your customers so you have an opportunity to run multiple tests on various aspects of it throughout the year.

And of course, the rewards for optimising it are always going to be greater. 

We particularly recommend that any event-triggered campaign (e.g. a cart recovery or win-back message) should be the subject of deep testing. This is especially important in the early days of a new campaign when you are trying to achieve optimal performance as quickly as possible.

Wrapping up

The Xtremepush platform features marketer-friendly A/B testing tools for all of the channels we offer. We’re making it simpler for brands to optimise their campaigns and deliver their core business objectives.