Is Your Data Lying? Find Out With An A/A Test

Is Your Data Lying? Find Out With An A/A Test

When you’re working with data, there’s a threat that people hardly ever talk about. It’s not about best practices, platform implementation, or integration with other services. It happens before your team even lays a hand on the A/B testing software.

Sometimes, the problem with data analysis is with the data itself.

A/A testing might sound redundant, but it’s a QA step that you can’t afford to skip. When you run an A/A test, you’re comparing a piece of content to itself — there’s no difference between the control group and the variant. The goal, then, is to see if the testing software provides accurate data in the first place. Ideally, the results of an A/A test will be flat, proving that the software won’t give you false positives when searching for differences in an A/B test.

In a perfect world, you wouldn’t need to run an A/A test. But the world isn’t perfect, and neither is data. We’ll walk you through the most commonly-asked questions about A/A testing.

What Is an A/A Test & Why Should I Run One?

An A/A test is a split test where the variant and the control version are identical. You can think of it as an A/B test that’s supposed to return flat results.

A/A testing can help you in a couple of different ways:

Calibrating Your Tools

Software isn’t fool-proof. It’s possible that your A/B testing platform simply isn’t processing your data correctly. If that’s the case, you’ll realize after a few weeks that there’s still a significant difference between your calibration and control groups.

It’s also possible that there’s an issue with your implementation. If the A/B testing platform wasn’t properly implemented with your app, it might not record data accurately. Along the same lines, A/A testing presents the perfect opportunity to cross-check your testing results with your analytics platform, if you use an external one. Spotting implementation errors early could save countless hours down the road.

Finding the Minimum Sample Size

As an A/B test executes on more users, its statistical significance grows higher. This means that it’s increasingly likely that the results of the test are not just due to chance. Generally, the test is deemed complete when it runs with enough users to push the statistical significance above a given threshold

You won’t normally have to calculate the minimum sample size yourself — tools like Leanplum’s A/B testing platform will automatically show you how long an experiment is likely to take, as shown below.

With our current volume of users, it will take five days to complete our test. As explained in our user guide, a higher probability reduces the chances of Leanplum providing a false negative, while a higher significance reduces the chances of a false positive.

In this example, we can ignore the percentage of change, since the variant and the control are identical. In an A/B test, you would tweak that value to suit the degree of change your hypothesis expects. You need a smaller margin of error to detect a subtle change, so the test will take longer to complete.

In practice, you can rely on your testing tool’s time estimates, but it can be valuable to see this information first-hand. You’ll witness just how inaccurate testing results can be before they’ve reached statistical significance.

How Do I Run an A/A Test?

You run an A/A test the same way you’d run an A/B test, except without the second variant. Here’s an example from the Leanplum dashboard:

As you can see, the Control variant is identical to Variant 1. Since the main objective is to confirm that our tools are working correctly, there’s no harm in picking a broad audience — in this case, all of our users. Targeting a wide segment with a generic impression criteria means that the A/A test will collect a lot of data in a short time.

Once the test is set up, all that’s left is to wait. In a few days, your analytics dashboard should show that the results of the control and the variant have converged, proving that your data is accurate. If they haven’t, it’ll take further troubleshooting to determine where the problem lies.

What’s the Downside to Running an A/A Test?

A/A testing doesn’t directly harm your app, but it does have an opportunity cost. Time spent configuring your tools is time not spent optimizing your app.

While you could technically run an A/A test in parallel with an A/B test, doing so would make the process more statistically complex. It will take longer for the test to complete, and you’ll have to discard your A/B test’s results if the A/A test shows that your tools aren’t properly calibrated.

In a sense, calibration is similar to optimization. It “couldn’t hurt” to calibrate your tools and optimize your app experience, but you have to decide whether the opportunity cost is worth it.

It may not be worth optimizing every minor tweak to your app if it slows down your workflow, especially as low-traffic tests could take weeks to complete. Likewise, it usually isn’t worth it to repeatedly calibrate your testing tools. The best time to run an A/A test is when you’ve integrated a new tool — if everything’s in working order, you can safely move on to A/B tests.

Today’s mobile teams espouse a data-driven approach to their work, so it’s natural that they’d want a data-driven approach to their data.

A/A testing probably shouldn’t be a monthly affair, but when you’re setting up a new tool, it’s worth taking the time to test your data. If you intercept bad data now, you’ll be more confident in your testing results months down the line.

Leanplum is the most complete mobile marketing platform, designed for intelligent action. Our integrated solution delivers meaningful engagement across messaging and the in-app experience. We work with top brands such as Expedia, Tesco, and Lyft. Schedule your personalized demo here.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics