About
I am a data science practitioner. I have over 10 years of experience in solving complex…
Activity
-
Today's topic is "Meet the Managers: AI." Thanks, Myrto Lalacos for hosting it and guest speakers Rachel Chalmers and Roger Jie Luo. I enjoyed the…
Today's topic is "Meet the Managers: AI." Thanks, Myrto Lalacos for hosting it and guest speakers Rachel Chalmers and Roger Jie Luo. I enjoyed the…
Liked by Nanyu Chen
-
During my break, I had the privilege to participate in two women-led events that are highly relevant to my career. I want to give a big shout-out to…
During my break, I had the privilege to participate in two women-led events that are highly relevant to my career. I want to give a big shout-out to…
Liked by Nanyu Chen
-
Two companies we were involved with from day one, Eve and Magic School, are featured in Bain Capital Ventures and Headline's Top 50 Emerging Vertical…
Two companies we were involved with from day one, Eve and Magic School, are featured in Bain Capital Ventures and Headline's Top 50 Emerging Vertical…
Liked by Nanyu Chen
Experience
Education
Publications
-
Top Challenges from the first Practical Online Controlled Experiments Summit
KDD explorations journal
Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook…
Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit. All thirteen organizations sent representatives. Together these organizations tested more than one hundred thousand experiment treatments last year. Thirty-four experts from these organizations participated in the summit in Sunnyvale, CA, USA on December 13-14, 2018.
While there are papers from individual organizations on some of the challenges and pitfalls in running OCEs at scale, this is the first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions.Other authors -
-
A Method for Measuring Network Effects of One-to-One Communication Features in Online A/B Tests
arXiv
A/B testing is an important decision making tool in product development because can provide an accurate estimate of the average treatment effect of a new features, which allows developers to understand how the business impact of new changes to products or algorithms. However, an important assumption of A/B testing, Stable Unit Treatment Value Assumption (SUTVA), is not always a valid assumption to make, especially for products that facilitate interactions between individuals. In contexts like…
A/B testing is an important decision making tool in product development because can provide an accurate estimate of the average treatment effect of a new features, which allows developers to understand how the business impact of new changes to products or algorithms. However, an important assumption of A/B testing, Stable Unit Treatment Value Assumption (SUTVA), is not always a valid assumption to make, especially for products that facilitate interactions between individuals. In contexts like one-to-one messaging we should expect network interference; if an experimental manipulation is effective, behavior of the treatment group is likely to influence members in the control group by sending them messages, violating this assumption. In this paper, we propose a novel method that can be used to account for network effects when A/B testing changes to one-to-one interactions. Our method is an edge-based analysis that can be applied to standard Bernoulli randomized experiments to retrieve an average treatment effect that is not influenced by network interference. We develop a theoretical model, and methods for computing point estimates and variances of effects of interest via network-consistent permutation testing. We then apply our technique to real data from experiments conducted on the messaging product at LinkedIn. We find empirical support for our model, and evidence that the standard method of analysis for A/B tests underestimates the impact of new features in one-to-one messaging contexts.
-
How A/B tests could go wrong: Automatic diagnosis of invalid online experiments
WSDM 2019
We have seen a massive growth of online experiments at Internet companies. Although conceptually simple, A/B tests can easily go wrong in the hands of inexperienced users and on an A/B testing platform with little governance. An invalid A/B test leads to bad business decisions, and bad decisions hurt the business. Therefore, it is now more important than ever to create an intelligent A/B platform that democratizes A/B testing and allow everyone to make quality decisions through built-in…
We have seen a massive growth of online experiments at Internet companies. Although conceptually simple, A/B tests can easily go wrong in the hands of inexperienced users and on an A/B testing platform with little governance. An invalid A/B test leads to bad business decisions, and bad decisions hurt the business. Therefore, it is now more important than ever to create an intelligent A/B platform that democratizes A/B testing and allow everyone to make quality decisions through built-in detection and diagnosis of invalid tests. In this paper, we share how we mined through historical A/B tests and identi ed the most common causes for invalid tests, ranging from biased design, self-selection bias to attempting to generalize A/B test result beyond the experiment population and time frame. Furthermore, we also developed scalable algorithms to automatically detect invalid A/B tests and diagnose the root cause of invalidity. Surfacing up invalidity not only improved decision quality, but also served as a user education and reduced problematic experiment designs in the long run.
-
False Discovery Rate Controlled Heterogeneous Treatment Effect Detection for Online Controlled Experiments
KDD 2018
Online controlled experiments (a.k.a. A/B testing) have been used as the mantra for data-driven decision making on feature changing and product shipping in many Internet companies. However, it is still a great challenge to systematically measure how every code or feature change impacts millions of users with great heterogeneity (e.g. countries, ages, devices). The most commonly used A/B testing framework in many companies is based on Average Treatment Effect (ATE), which cannot detect the…
Online controlled experiments (a.k.a. A/B testing) have been used as the mantra for data-driven decision making on feature changing and product shipping in many Internet companies. However, it is still a great challenge to systematically measure how every code or feature change impacts millions of users with great heterogeneity (e.g. countries, ages, devices). The most commonly used A/B testing framework in many companies is based on Average Treatment Effect (ATE), which cannot detect the heterogeneity of treatment effect on users with different characteristics. In this paper, we propose statistical methods that can systematically and accurately identify Heterogeneous Treatment Effect (HTE) of any user cohort of interest (e.g. mobile device type, country), and determine which factors (e.g. age, gender) of users contribute to the heterogeneity of the treatment effect in an A/B test. By applying these methods on both simulation data and real-world experimentation data, we show how they work robustly with controlled low False Discover Rate (FDR), and at the same time, provides us with useful insights about the heterogeneity of identified user groups. We have deployed a toolkit based on these methods, and have used it to measure the Heterogeneous Treatment Effect of many A/B tests at Snap.Other authorsSee publication -
Evaluating Mobile Apps with A/B and Quasi A/B Tests
KDD 2016
We have seen an explosive growth of mobile usage, particularly on mobile apps. It is more important than ever to be able to properly evaluate mobile app release. A/B testing is a standard framework to evaluate new ideas. We have seen much of its applications in the online world across the industry [9,10,12]. Running A/B tests on mobile apps turns out to be quite different, and much of it is attributed to the fact that we cannot ship code easily to mobile apps other than going through a lengthy…
We have seen an explosive growth of mobile usage, particularly on mobile apps. It is more important than ever to be able to properly evaluate mobile app release. A/B testing is a standard framework to evaluate new ideas. We have seen much of its applications in the online world across the industry [9,10,12]. Running A/B tests on mobile apps turns out to be quite different, and much of it is attributed to the fact that we cannot ship code easily to mobile apps other than going through a lengthy build, review and release process. Mobile infrastructure and user behavior differences also contribute to how A/B tests are conducted differently on mobile apps, which will be discussed in details in this paper. In addition to measuring features individually in the new app version through randomized A/B tests, we have a unique opportunity to evaluate the mobile app as a whole using the quasi-experimental framework [21]. Not all features can be A/B tested due to infrastructure changes and wholistic product redesign. We propose and establish quasi-experiment techniques for measuring impact from mobile app release, with results shared from a recent major app launch at LinkedIn.
Other authors -
From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks
KDD 2015
A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used among online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. At LinkedIn, we have seen tremendous growth of controlled experiments over time, with now over 400 concurrent experiments running per day. General A/B testing frameworks…
A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used among online websites, including social network sites such as Facebook, LinkedIn, and Twitter to make data-driven decisions. At LinkedIn, we have seen tremendous growth of controlled experiments over time, with now over 400 concurrent experiments running per day. General A/B testing frameworks and methodologies, including challenges and pitfalls, have been discussed extensively in several previous KDD work. In this paper, we describe in depth the experimentation platform we have built at LinkedIn and the challenges that arise particularly when running A/B tests at large scale in a social network setting. We start with an introduction of the experimentation platform and how it is built to handle each step of the A/B testing process at LinkedIn, from designing and deploying experiments to analyzing them. It is then followed by discussions on several more sophisticated A/B testing scenarios, such as running offline experiments and addressing the network effect, where one user’s action can influence that of another. Lastly, we talk about features and processes that are crucial for building a strong experimentation culture.
Other authorsSee publication
Patents
-
POST-EXPERIMENT NETWORK EFFECT ESTIMATION BASED ON LOGGED MESSAGING EVENTS
Filed US 60352-0367
-
MODEL-BASED MATCHING FOR REMOVING BIAS IN QUASI-EXPERIMENTAL TESTING OF MOBILE APPLICATIONS
Filed US 15/140239
-
MODEL VALIDATION AND BIAS REMOVAL IN QUASI-EXPERIMENTAL TESTING OF MOBILE APPLICATIONS
Filed US 15/140250
-
A/B TESTING ON DEMAND
Filed US 15/140,186
-
Flexible Targeting
Filed US 14/944,100
-
Site Wide Impact
Filed US 62/141,126
-
Triggered Targetting
Filed US 62/140,366
-
Most Impactful Experiments
Filed US 62/141,193
Languages
-
English
Full professional proficiency
-
Mandarin
Native or bilingual proficiency
-
Cantonese
Professional working proficiency
-
French
Elementary proficiency
Organizations
-
American Statistical Association
-
- Present
More activity by Nanyu
-
In a well-deserved win, Franklin Suguitan Jr. (Dre) has triumphed in this quarter's Potion Shop Competition at California Polytechnic State…
In a well-deserved win, Franklin Suguitan Jr. (Dre) has triumphed in this quarter's Potion Shop Competition at California Polytechnic State…
Liked by Nanyu Chen
-
I've been an avid and opinionated reader of The Economist for over 20 years, but this is the first time I've been quoted in there. What an…
I've been an avid and opinionated reader of The Economist for over 20 years, but this is the first time I've been quoted in there. What an…
Liked by Nanyu Chen
-
Stanford conference on experimentation today. I'll write a few notes here. Ramesh Johari guido imbens https://lnkd.in/g3NdmfJq Official photos:…
Stanford conference on experimentation today. I'll write a few notes here. Ramesh Johari guido imbens https://lnkd.in/g3NdmfJq Official photos:…
Liked by Nanyu Chen
-
Feeling heartwarming and humbled to receive this recognition from my students during my first year as a professor. A big thank you to the professors…
Feeling heartwarming and humbled to receive this recognition from my students during my first year as a professor. A big thank you to the professors…
Liked by Nanyu Chen
-
Sales Engineers make the best Product Managers. They understand: 1/ the customer's need 2/ the product warts 3/ engineers 4/ leading without being…
Sales Engineers make the best Product Managers. They understand: 1/ the customer's need 2/ the product warts 3/ engineers 4/ leading without being…
Liked by Nanyu Chen
-
The paper, False Positives in A/B Tests, was accepted to KDD 2024: https://lnkd.in/gVjj4Wpv We are publishing our nearly final draft for…
The paper, False Positives in A/B Tests, was accepted to KDD 2024: https://lnkd.in/gVjj4Wpv We are publishing our nearly final draft for…
Liked by Nanyu Chen
-
🎉 Exciting News 🎉 We're thrilled to announce that our paper, "Improving Ego-Cluster for Network Effect Measurement," co-authored by Wentao Su and…
🎉 Exciting News 🎉 We're thrilled to announce that our paper, "Improving Ego-Cluster for Network Effect Measurement," co-authored by Wentao Su and…
Liked by Nanyu Chen
-
Recently I presented at an internal reading group for a deep dive into recent advances in generative LLM modeling. All materials are from open-source…
Recently I presented at an internal reading group for a deep dive into recent advances in generative LLM modeling. All materials are from open-source…
Liked by Nanyu Chen
-
Don't focus on building experience. Build judgment instead. How can it be that so many outrageously successful startups had founders with virtually…
Don't focus on building experience. Build judgment instead. How can it be that so many outrageously successful startups had founders with virtually…
Liked by Nanyu Chen
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More