How many emails do you generally get in a single day? The answer is probably at least 20-50. In today’s digital world, consumers receive so many emails – which is why it is a great medium to use for A/B testing. As a reminder, A/B testing (also sometimes called split testing) is a process of showing two or more variants of the same digital media (website or email) to different segments of visitors at the same time and comparing which variant drives more engagement. Email A/B testing can really help you learn more about your audience and tailor your communications to their needs. In this blog we will outline our method, best practices, and some lessons learned while email A/B testing at the Consumer Financial Protection Bureau (CFPB).
To begin we will discuss our method. When starting any A/B test, whether that’s on our website or via email, we want to have a well-defined test strategy. Our strategy must include a clear understanding of WHAT we’re testing and WHY. Always keeping in mind who our audience is and the goal that we want them to achieve. Some of the lessons learned from past email A/B testing have been:
- test the right variants,
- don’t have too many variants,
- and help yourself out by using a template.
If you’re interested in increasing open rates, you probably will not want to test adding a picture into the email content since the user has to actually open the email first to view the picture. Instead, testing subject lines and how they affect open rates, or testing pictures to see if one affects engagement with a specific call-to-action would be better tests.
We also learned that narrowing down the number of variants to test is very important for statistical significance. If you test too many variants and spread your segmented visitors too thin, you may never reach statistical significance and therefore cannot interpret the data accurately.
Finally, having a process and capturing past research is helpful when developing new tests to know what has and has not worked already. Past research can also lend to the development of best practices when sending out future emails – example emails sent from a personal looking email account perform better than emails sent from an agency.
Having a grand idea is always great, but in addition to our strategy, we have to ensure that the test can be executed successfully. Questions we may ask ourselves:
- Do I have the right tools?
- Is what we are interested in testing actually feasible?
- Are there technology limitations?
- Will my listserv generate enough traffic to even reach statistical significance?
Every test should be thoughtfully planned and given enough time to make sure the test can actually be set up and run successfully. Speaking of enough time, when thinking about statistical significance, there are several calculators online that can help you estimate how much traffic you’d need to reach statistical significance. We have been fortunate enough to work with our data scientists at the bureau to create our own calculator, but you can also search online for “A/B Split Multivariate Test Duration Calculator” and see which one will work best for you. Keep in mind: if you have a test that fails to reach statistical significance, you are still learning. Learn from the hiccups and keep trying more tests!
If we are able to successfully run a test, the last step is analysis and applied learning. The goal of every A/B test is to find what variant drives more engagement, and then implement that change. However, it should be noted that not every successful test should be thought of as “one and done.” What you learned in one email A/B test may not be as helpful in another email catering to an entirely different audience. Over time, your audience could also evolve; what resonates with them today may not in 6 months. That’s why it is important to continually test. Be curious—but also keep track of all of the different email A/B tests that you have done and review them. When you are looking at setting up new tests for the quarter or for the year, think about what you did already (what worked and what didn’t work), or what could you refine and test again? There is always room to keep learning.