There’s something very romantic – in a business sense – about intuition and making gut-based decisions. After all, Steve Jobs himself was an advocate for following your heart. But while intuition may serve to a point, there comes a time in every business journey when decisions need to be based on data. And that’s where email A/B testing comes in.
Data-driven organisations are three times more likely to report significant improvements in decision-making, compared to those who rely less on data. Rather than relying on a hunch regarding who the customer is, what messaging they respond to, and what journey they should take, these companies make their decisions based on data.
With so much mobile app marketing now automated, it has become possible to conduct tests on a massive scale. This, in turn, ensures that the data produced is reliable, confirming that X does affect Y.
E-mail A/B testing – what should you test?
The number of variables that can be tested within an app business is almost infinite – from targeting specific audiences, to adaptations in tone of voice, to small design changes, every decision you make in marketing can be tested to find the optimal version.
Take a Call to Action button – you can test whether increasing the size increases Click Through Rate. Once you’ve found the best size, you can iterate on that to find out what colour works best, and then experiment to find the best copy to use. The testing process for this one button alone could take several months, but if in the end you have a CTA button with an excellent CTR (which you can roll out across your business) it will have been worth it.
Now think about that level of experimentation for the marketing decisions you make on a daily basis, and you’ll have scratched the surface of the potential for app marketing automation experimentation.
Introducing email A/B testing for mobile apps
In order to draw robust conclusions from your app experiments, you need to compare two versions: a control version (the original) to a test version (an alternate version to the original).
This is split testing, sometimes called A/B testing. This allows you to test a variety of options for optimal success, let’s take the email subject line, tests may include:
- A short sharp subject line (A, original version)
- A more descriptive and explanatory subject line (B, new version)
- Or the use of emojis in your subject line
- Whether you personalise the subject line to the recipient
In each instance, you will funnel a proportion of your users into Experience A and the rest into Experience B, measure how many people in each group take the desired action (in this case, open the email), and compare the results.
A/B testing and dynamic data
A/B testing works hand in hand with dynamic data, which we wrote about here. This type of data has the power to personalise your marketing campaigns, pulling in elements that are personal to the user, such as the localised language in a language-learning app or the TV show they’ve most recently watched from your streaming app.
Personalised marketing yields better results than “one-size-fits-all” campaigns, but there will also be types of content that people respond to. For example, if you were to test an email campaign for a streaming service, would the subscription rate be higher when users were shown images of the TV series they’ve most recently binged, or the film they’ve watched time and again? By A/B testing the dynamic data to use, you can find the content applicable to encourage engagement with these customers.
Email A/B testing: Step-by-step
1. Identify your priorities
Analyse your current campaign and app engagement metrics. Questions you should be asking yourself include:
- What problem can you tackle that will have the largest impact on your business and are there any quick wins you can identify?
- What challenges are you facing?
- Are you struggling to convert sign-ups into paid users?
- Do your paid ads have a low click-through rate?

2. Hypothesise on the result
Once you’ve identified what you want to test, consider the result you want to achieve and the impact it might have. It’s important to formalise your hypothesis by writing it down so that you can return to it at the end to evaluate it as true or false.

3. Plan your test
Decide on the specifics of what’s going to change between Experience A and B, such as copy, colour, imagery etc, as well as when you’re going to run it and who you’re going to test (more on this shortly).
When planning your test, you need to keep all the other variables the same in order to receive a result that you can confidently implement. If you test two things at once or change the conditions between the A and B tests, it’ll be impossible to know which one of those things had the most effect on the end result. (This won’t be entirely possible, because while you can control, for instance, what time you send out your emails, you can’t control when your subscribers choose to read them.)

4. Collect your data
It’s important that you collect enough data, which means you need a large enough sample size (simply the number of people who participate in the test). You can use this calculator to find out the minimum number of people you need in each test group.
This will partly be determined by how many users you have, but also by what you’re testing. If it’s something like an email, with a finite audience, you can simply split your audience list in half. If you’re testing something like an app or website, you’ll need to run the test for long enough to collect an adequate number of results.
Don’t be afraid to run your test until you’ve collected enough data – stopping too soon will render your results invalid.
Pro tip: Collective qualitative data (descriptive) while you’re collecting quantitative data (numerical) by conducting user interviews. While vast amounts of convert/did not convert data will show you whether X influences Y, by actually speaking to your users, you’ll get invaluable insight into why people are making the choice they’re making.

5. Report and analyse your results
Having formalised what, when, and how you were going to test, you’ll now be able to add the results alongside this. If you’ve found that – yes, making a change does lead to an increase in conversions, you’ve reached the most important question: are your results reliable enough to make that change?
Buckle up – this is where we fall into the realm of statistical significance.
You may remember it from maths lessons of yore – it’s simply a measure of whether the variables you’ve tested for are truly influenced by each other, or whether your result is random. The higher the level of statistical significance, the more confident you can be.
In order to find out if your results are reliable, you need to use a statistical significance calculator (or actual maths. But we’d recommend the calculator).

And while 1% doesn’t seem a lot, extrapolating that to your whole user base can result in small gains that can really move the needle.
6. Implement the winning variation
If you’re confident in your results, you can roll out the winning variation to all users. If the results look promising but aren’t significant, you could decide to rerun the test, or if they make no real difference to conversion rates you can move on to test something else, keeping things as they are.

7. Iterate
Here’s where you reach a continual optimisation loop, whereby you continue to test until you reach the absolute best version of your email, push notification, or other user experience.
Obviously, this iteration is a long-term process; something an app business will commit to for many months or even years. But there are efficiencies to be found. By sharing the results across teams, you can apply learnings across the business – if it’s discovered that users respond well to a certain type of photography, this knowledge can not only be applied to all email campaigns but to paid advertising, app screengrabs and more.

A word on multivariate testing
So that’s A/B tests, but what about the Cs, Ds, Es, and Fs? Are there situations where it’s possible to test more than two variants at a time?
Here’s where multivariate testing comes in, which measures both more variables and provides information regarding how they interact with each other. This can be particularly useful for personalised marketing campaigns for mobile apps.
In the previous example, where you run separate tests for the personalisation of subject lines, email copy contents and email images, you could instead produce variants that cover every combination of the above, allowing you to measure the effectiveness of each element on increasing the conversion rate. This would allow you to find the optimal combination much more quickly than running those A/B tests individually.
However, this testing requires a lot of data. If you’re producing, for instance, 25 variations of an email and you need a sample size of at least 1,000 people for each, you need a mailing list of 25,000 or more.
Conclusion
Given the number of elements that can – and should – be tested within app marketing automation, it’s fair to say that this type of experimentation is not a quick win. Rather, it’s an infinite feedback loop that should constantly improve performance. And since it’s unlikely that any business will ever reach 100% conversion for every campaign and every customer touchpoint, it should be thought of as a journey, not a destination.