Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A/B Testing

Mazher Khan - IIT (BHU) - B.Tech (DR-2)


Senior Data Analyst @Target | Ex - OLX (EU)
YouTube - 2.2M+ (Views) l LinkedIn 16k+

Telegram Link- https://t.me/+XTjv6r80eDc5ZWU1

Practice Workbook - 100 https://docs.google.com/spreadsheets/d/1eP8evU2JIsawAVJ7GH_


Days Challenge NNd_2xNLOT7abpJTs5O9iUQI/edit#gid=775777503

Follow me on Linkedin https://www.linkedin.com/in/mazher-khan/

Follow on Instagram https://www.instagram.com/khan.the.analyst

Book 1:1 Mentorship Plan -


https://www.preplaced.in/profile/mazher-khan
1,3, 6 Months
Book for Career Guidance,
https://topmate.io/mazher_khan
CV review & interview tip

Follow on Youtube https://youtube.com/@imzhr.?si=KdMGmWt-vTy12hxV

@khan.the.analsyt
Statistical signifiance is a term used to indicate whether the results of a study are likely due to
chance or are actually meaningful. In the context of A/B testing, statistical significance is used
to determine whether the observed difference in conversion rates between two groups

1. Define the hypothesis:


a. Null hypothesis : No difference in conversion rates b/w the two designs
b. Alternative: Difference in conversion rates b/w the two designs.
2. Determine the sample size: Required to detect a significant difference in conversion
rates between the two designs. General Approach:
a. Determine the expected effect size: The effect size is the difference in conversion
rates between the two designs that you expect to see. This can be based on
previous studies, industry benchmarks, or expert opinions.
b. Use a sample size calculator online: : Confidence level, Population
c. Choose the desired level of statistical power: Statistical power is the probability
of correctly rejecting the null hypothesis when it is false. A higher statistical
power increases the chance of detecting a real effect. A commonly used
statistical power level is 80%.
d. Set the significance level: The significance level is the probability of incorrectly
rejecting the null hypothesis when it is true. The standard significance level is 5%.
3. Randomly assign users to the two groups:
a. Use a random number generator: Randomly assign users between 0 and 1. Test
Group : <=0.5 ; Control >0.5
b. Use stratified random sampling: If there are certain user characteristics that may
influence the outcome of the test (e.g., age, gender, location), you can use
stratified random sampling to ensure that the test and control groups are similar
in terms of these characteristics. First, divide the users into strata based on the
characteristic of interest (e.g., age groups). Then, randomly assign users within
each stratum to the test and control groups.
c. Use third-party tools: There are many third-party tools available that can help with
random assignment, such as Google Optimize, Optimizely. These tools can help
ensure that the random assignment is unbiased and that the test and control
groups are properly balanced.
4. Collect data: Collect data on the number of users who visit the page and the number of
users who make a purchase for each group.
5. Analyze the data: Analyze the data using a hypothesis test such as a t-test or a
chi-square test. If the p-value <0.05;, reject the null hypothesis and conclude that there is
a significant difference in conversion rates
a. Calculate the conversion rates and check significance thru online tool: Calculate
the conversion rate (i.e., the proportion of users who made a purchase) for each
group.
b. Calculate the p-value: T-test

@khan.the.analsyt
6. Draw conclusions: If the null hypothesis is rejected, you can conclude that the new
design performs better than the current design in terms of conversion rate. If the null
hypothesis is not rejected, you cannot conclude that there is a difference in conversion
rates between the two designs.
7. L2Q Q2B B2I I2P L2B L2P
8. 57.80% -45.80% -35.50% 130.60% -14.40% 27.20%

Type 1 error (False Positive): When you reject a true null hypothesis. In other words, you
conclude that there is a significant difference between the two groups when, in reality, there is
no difference.

Type 2 error (False Negative): When you fail to reject a false null hypothesis. In other words, you
conclude that there is no significant difference between the two groups when, in reality, there is
a difference.

The relationship between Type 1 and Type 2 errors is inverse. As the probability of making a
Type 1 error decreases, the probability of making a Type 2 error increases, and vice versa.
Therefore, it's important to balance the risk of these two errors in hypothesis testing.

1. Which Type error is more important: Understanding the consequences of each type of
error can help you determine the appropriate balance between them.
2. Consider the effect size: The effect size is the magnitude of the difference between the
two groups being tested. Larger effect sizes are easier to detect, which reduces the risk
of both Type 1 and Type 2 errors. If you have a small effect size, you may need a larger
sample size to achieve the desired balance between Type 1 and Type 2 errors.
3. Adjust the significance level: The significance level (alpha) determines the probability of
making a Type 1 error. A lower significance level reduces the risk of Type 1 errors but
increases the risk of Type 2 errors. Conversely, a higher significance level increases the
risk of Type 1 errors but decreases the risk of Type 2 errors. You can adjust the
significance level based on the importance of each type of error and the consequences
of making them.
4. Increase the sample size: Increasing the sample size can reduce the risk of Type 2
errors, but it also increases the cost and time required to conduct the experiment. The
optimal sample size depends on the effect size, the significance level, and the desired
balance between Type 1 and Type 2 errors.

How do you handle inconclusive results in an A/B test?

1. Review the data: To see if there are any patterns or trends or anomalies
2. Re-evaluate your hypothesis:
3. Extend the test:
4. Consult with experts:

@khan.the.analsyt
How long A/B test should be run

1. The length of time an A/B test should be conducted depends on several factors:
a. Sample size, the effect size, and the amount of traffic on your website.
2. As a general rule, should run an A/B test for long enough to collect a sufficient amount
of data to ensure that the results are statistically significant and is not by chance.
3. If you stop the test too soon, you may end up with inconclusive or unreliable results. On
the other hand, if you run the test for too long, you may end up wasting resources and
delaying the implementation of changes that could drive improvements in your business
metrics.
4. Once you have determined the minimum sample size, you can estimate how long it will
take to collect that amount of data based on your website traffic and conversion rates.
5. In general, a minimum test duration of two weeks is recommended to ensure that the
results are reliable. However, the optimal test duration will depend on your specific
circumstances, so it's important to use a sample size calculator and consult with experts
if you're unsure.

How do you analyze the results of an A/B test?

1. Calculate the difference between the control and treatment groups:


2. Check for statistical significance:
3. Consider practical significance:
4. Look at other metrics:
5. Review any anomalies: Look for any anomalies or unexpected results that may be worth
investigating further. For example, did a particular demographic group respond
differently to the changes you made?
5.

What are some common pitfalls to avoid when running an A/B test?

1. Lack of a clear hypothesis:


2. Running the test for too short a time:
3. Not controlling for external factors: External factors like seasonal variations, changes in
user behavior, or other changes to your website can all affect the results of your A/B test.
4. Using small sample sizes:
5. Testing too many variables at once: Testing too many variables at once can make it
difficult to determine which changes led to any observed differences between the control
and treatment groups. To get meaningful results, it's best to test one variable at a time.
6. Ignoring the practical significance of the results: While statistical significance is
important, it's also important to consider the practical significance of any observed
differences in the key metric. Just because a difference is statistically significant doesn't
mean it's meaningful in the context of your business goals.

@khan.the.analsyt

You might also like