A/B-testing your marketing emails can be a profitable game-changer. In fact, there is no better way to determine the impact a design, copy, or scheduling change can have on the success of your email campaigns.
But if you're unfamiliar with A/B testing or you're just getting started, it can be an overwhelming project to take on. Here's a list of do's and don'ts to steer you in the right direction.
1. Do run tests that have recurring benefits
If you've done your research, you know you can test a myriad of variables in your emails. And although some of them might be intriguing (what would happen if you alternated capitals and lowercase in the headline?), some tests are unnecessary. Don't rely solely on how-to-A/B-test articles like this one to inform your testing schedule. You can, however, use them for inspiration to find factors that are relevant to your specific audience.
For example, if you're testing images for a sports apparel company, you could test variables such as product images on backgrounds, using models in your images, or showing posed shots versus action shots. (You can use a free photo editing tool, such as Pixlr, to manipulate your images.)
Try to choose global factors that will apply to all of your future emails before worrying about smaller considerations that won't appear too often in your campaigns.
Be patient with yourself here: A/B testing is a science; learning what to A/B-test is an art.
2. Don't test factors that are one-email specific
Sometimes your one-off emails have an exciting topic or they provide the opportunity for new A/B tests. But if you will have a Happy New Year's email only once a year, and you're not advertising products in them, don't bother testing the size of your snowflake graphic.
Testing changes that have reoccurring applications means they'll have a bigger impact on your business. Focus on tests that may be less exciting for the moment—but much more likely to have impact in the long run.
3. Do test like a scientist
Do you remember your middle school science fair projects? Ms. Morgan told the class that you needed a control factor, a variable factor, and a hypothesis to make a test credible. Well, class is back in session. Make sure you isolate all other factors as much as possible when creating your A and B tests. Preferably, your A test is the control, and is representative of your current email. The B test is almost identical, but altered by one factor.
So if you're testing subject line length, here is a bad test:
A: Get a move on, pal!
B: T-SHIRTS FROM $4.99 ON SALE NOW DON'T MISS OUT
Those two differ on length, punctuation, capitalization, and content.
A better test would be:
A: Pick up T-shirts from $4.99
B: Pick up T-shirts from $4.99 in dozens of styles & colors
Test one thing at a time and keep your sets of recipients random.
4. Don't use before and after tests
Testing one week against another may seem like an OK test as long as all other factors stay the same... but it's innately flawed. The week it's sent out could be the very reason/factor it does better or worse, skewing the results of your test.
For an A/B test to be truly valid, you must send the emails at the same time on the exact same day (excluding, of course, tests where the test factor is delivery day of the week or time).
5. Do read data as observations
Data can get confusing if you leave it in the numbers. Test B resulted in a 5% lower open rate but 8% higher click-through rate. Great? Use plain English statements to communicate what happened.
If you are looking at opens: For everyone who received the email, did this test improve their open rate?
If you are testing for an internal element:
For everyone who opened the email, did this test improve the click-through rate?
For everyone who opened the email, did the test drive more orders?
For everyone who clicked through the email, did this test change their likelihood of ordering?
Breaking down the results of your data, using plain English, clears up what has actually changed and what information is unrelated to your A/B test.
6. Don't call it too early
It's great to be so excited for the results to pour in that you're refreshing Mail Chimp (or "Mail Kimp" for you Serial listeners) every 30 seconds. But the first two hours of campaign responses are not representative. Early responders may react differently than those who take a while to find and open your email.
A good standard is to wait at least 24 hours before examining your data. (But the cool kids give it at least 48.)
7. Do give yourself a reality check
Don't run a test you think will be statistically insignificant; it wastes precious time and resources. Or, if you're not sure, save those tests for a rainy-day email when you don't have anything else to test.
So, what makes a test statistically significant? It means enough people have changed their behavior that you are 95% sure the difference in results are not random. For example—if you have 100,000 subscribers and get around 200 orders from an email, how many new orders would make the results significant?
A quick way to find out if you've got something big on your hands is to run the numbers through a free app called VWO. Here, your control should be your A test (say, 50,0000 visitors and 100 orders), and the variation should be your B test (say, 50,000 visitors and 125 orders).
Leave the math up to the machine. If the results are "Yes!" you've got yourself a change that's proven significant.
8. Don't over-test for your sample size
Know the limits of your email list. If you have 25,000 subscribers, you won't be able to run four variations in a single week. Breaking the list apart that much would skew your data and prevent you from seeing real results.
There's no hard-and-fast rule for amount of subscribers vs. numbers of variations, but as you break apart your sample size more and more, the results become less significant for each test. Again, use the VWO calculator to see what kind of results you need for each variation.
9. Do act on your results
Once you've gotten your data back from the test and you've translated it into actionable observations, act on it! If the size of the image associated with your call to action (CTA) has increased click through rate (CTR) by a significant amount, change the size. The tests that are significant can often lead to other tests that are significant.
Next, try testing the color, copy, or placement of the CTA to see what happens. On the other hand, if moving your product images around isn't affecting your results, stop testing it. Once you've sniffed out a good direction of tests, follow the scent. But, remember, one test at a time!
10. Don't assume results are permanent
The very reason for A/B testing is the unpredictable and invariably changing landscape of email, e-commerce, and consumer behavior. With that in mind, don't set A/B test results in stone. You need to not only test a factor multiple times (because environmental factors and chance could create varying results) but also question the results (and re-run the tests) every few months or up to a couple of years, depending on the nature of the factor being tested.
CTA variable findings might last years, whereas "trendy" changes in subject lines (such as personalization) could feel old to recipients in a matter of months. Or maybe your audience has grown sick of emojis. You might want your brand to appeal to a new, younger segment with punchier copy, but if everyone is doing it... it's not cool anymore. And remember, you're working with 95% confidence intervals, not 100%.
Better safe than sorry—so keep testing!