Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

A Probabilistic Position Bias Model for Short-Video Recommendation Feeds

OLIVIER JEUNEN, ShareChat, United Kingdom


Modern web-based platforms often show ranked lists of recommendations to users, in an attempt to maximise user satisfaction or
business metrics. Typically, the goal of such systems boils down to maximising the exposure probability —conversely, minimising the
rank— for items that are deemed “reward-maximising” according to some metric of interest. This general framing comprises music or
movie streaming applications, as well as e-commerce, restaurant or job recommendations, and even web search. Position bias or user
models can be used to estimate exposure probabilities for each use-case, specifically tailored to how users interact with the presented
rankings. A unifying factor in these diverse problem settings is that typically only one or several items will be engaged with (clicked,
streamed, purchased, et cetera) before a user leaves the ranked list.
Short-video feeds on social media platforms diverge from this general framing in several ways, most notably that users do not tend
arXiv:2307.14059v1 [cs.IR] 26 Jul 2023

to leave the feed after, for example, liking a post. Indeed, seemingly infinite feeds invite users to scroll further down the ranked list. For
this reason, existing position bias or user models tend to fall short in such settings, as they do not accurately capture users’ interaction
modalities. In this work, we propose a novel and probabilistically sound personalised position bias model for feed recommendations.
We focus on a 1st -level feed in a hierarchical structure, where users may enter a 2nd -level feed via any given 1st -level item. We posit that
users come to the platform with a given scrolling budget that is drawn according to a discrete power-law distribution, and show how
the survival function of said distribution can be used to obtain closed-form estimates for personalised exposure probabilities. Empirical
insights gained through data from a large-scale social media platform show how our probabilistic position bias model more accurately
captures empirical exposure than existing models, and paves the way for improved unbiased evaluation and learning-to-rank.

CCS Concepts: • Information systems → Specialized information retrieval; Recommender systems; Evaluation of retrieval
results; • Computing methodologies → Learning in probabilistic graphical models.

Additional Key Words and Phrases: Probabilistic Modelling; Position Bias; Mean Reciprocal Rank

ACM Reference Format:


Olivier Jeunen. 2023. A Probabilistic Position Bias Model for Short-Video Recommendation Feeds. In Seventeenth ACM Conference
on Recommender Systems (RecSys ’23), September 18–22, 2023, Singapore, Singapore. ACM, New York, NY, USA, 10 pages. https:
//doi.org/10.1145/3604915.3608777

1 INTRODUCTION & MOTIVATION


Recommender system applications on the web often operate in a ranking fashion, showing ordered lists of items to users
in an attempt to optimise some metric(s) of interest. Such metrics typically reflect user satisfaction, business goals or fair-
ness concerns. With the ranking paradigm comes an important caveat: items that are shown at higher positions are more
likely to be exposed to the user, and this discrepancy should be taken into account when considering data from logged user
interactions [21]. Indeed, it has implications for evaluation [4, 14], learning [22], and fairness of exposure [10, 18, 26]. The
problem of position bias and its relevance to recommendation systems has been well-studied in recent years [7, 27, 33, 37].
Most of these existing works focus on the classical Information Retrieval (IR) task of web search, where documents
are ranked as search results to be surfaced for a given query. Effective methods for de-biasing in web search are often
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on
servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Manuscript submitted to ACM

1
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen

transferable to recommendation domains, when we replace queries with users and documents with items. As a result,
evaluation metrics such as Normalised Discounted Cumulative Gain (nDCG) are a common choice when assessing top-𝑛
recommendation quality [36]. An often overlooked point is that the discount is directly related to position bias, and that
well-chosen discount functions are necessary to consider nDCG an unbiased offline estimator of online reward [17]. This
is a desirable feat, as discrepancies between off- and on-line evaluation results have plagued the recommender systems
field for years [3, 12, 13, 16, 19, 32]. Models of user behaviour, such as those underlying the rank-biased precision
(RBP) [25] or expected reciprocal rank (ERR) [5] metrics, can be used to construct discount functions for nDCG-like
metrics that emulate the empirical position bias in a given system well. Existing work in this area has largely focused
on web search [8], with extensions to general recommendation use-cases in e-commerce [24].
Short-video feeds on social media platforms, however, imply very different interaction paradigms than those prevalent
in web search. Indeed, users are unlikely to abandon the feed after, for example, liking a post. There is no information,
but rather an entertainment need for users scrolling the feed. As such, user models that are prevalent in other application
areas are not directly applicable to our use-case [43]. Aside from more general work by Wu et al. [41], this topic has
received relatively little research attention. We specifically focus on a 1st -level feed in a hierarchical structure, where
users can either keep scrolling the current feed, or enter a “more-like-this” 2nd -level feed via any 1st -level item. Indeed,
such user interfaces have gained popularity recently, and can be found on Reddit, Instagram and ShareChat, among
others. Our hypothesis is that users come to the platform with a “scrolling budget”, reflecting how far they are willing to
scroll before abandoning the feed. This budget is personalised, context-dependent, and drawn from a discrete power-law
distribution such as the Yule-Simon distribution with shape parameter 𝜌 [34, 42]. Figure 2(a) visualises how this family
of distributions can represent a wide variety of stochastic budgets and, hence, scrolling behaviours.
We show how the survival function of this distribution can be used to obtain closed-form estimates for personalised
exposure probabilities that have a sound theoretical basis, and show how they pave the way for improved unbiased
evaluation and learning-to-rank in feed recommendation settings.
The main contributions we present in this paper are the following:
(1) We propose a novel Contextual, Personalised, Probabilistic POsition bias model for feed recommendations: C-3PO.
(2) We empirically validate using real-world data that C-3PO is better able to capture exposure probabilities than
existing methods, whilst having a stronger theoretical basis.
(3) We show how C-3PO can be used for improved unbiased evaluation and learning in feed ranking scenarios.

2 METHODOLOGY & CONTRIBUTIONS


2.1 Position Bias Models
For ease of terminology but without loss of generality, assume we want to maximise the quality (𝑄) of items that are
viewed (𝑉 ) by a user in a certain context (𝑋 ). 1 As is typical, the true quality is not an observable quantity, but we can
estimate it from logged user interactions. We will refer to these as clicks (𝐶) that occur when a post is both viewed and
deemed quality, but they can represent more general engagement (e.g. likes). A ranking policy 𝜋 is in place, deciding at
which rank (𝑅) a given post will appear, dependent on the context 𝑋 and a quality estimate 𝑄.
b We can describe the
relation between the unobservable quality 𝑄 and the observable quantities 𝐶, 𝑉 and 𝑅 as follows (omitting item 𝑖):
P(𝐶 |𝑅, 𝑋 )
P(𝑄 |𝑋 ) = . (1)
P(𝑉 |𝑅, 𝑋, 𝜋)
1 Note that in classical web-search use-cases, view events 𝑉 are unobserved, as multiple items are presented to the user simultaneously [20]. This is
different from our use-case, where items impressed to the user take up most of their mobile screen, and we can extract labels 𝑉 from scrolling behaviour.
2
A Probabilistic Position Bias Model for Short-Video Recommendation Feeds RecSys ’23, September 18–22, 2023, Singapore, Singapore

𝜋 𝑋

do(𝑅 = 𝑟 ) 𝑅 𝑉 𝐶 𝑄
Fig. 1. Probabilistic Graphical Model (PGM) detailing how interventional data collection allows for unbiased estimation of the causal
effect between the rank 𝑅 and view events 𝑉 , by removing incoming causal edges from the deployed ranking policy 𝜋 .
This is a well-known result, showing how we can obtain unbiased estimates of quality conditional on context, by
reweighting observed clicks with exposure or viewing probabilities (i.e. inverse propensity scoring, or IPS [30, Ch. 9]). In
this work, we do not focus on selection bias, i.e. bias stemming from the ranking policy 𝜋 influencing the rankings [28, 29].
Instead, we wish to estimate the causal effect of rank on exposure, conditional on context.
Without considering contextual information, Joachims et al. originally proposed to perform randomised interventions
on the rank of a given item to obtain such estimates [22]. Further extensions proposed to leverage historical interventions
stemming from natural experiments [1], and this was further extended to be context-dependent [11]. Whichever of
these methods is adopted, we essentially obtain data from the interventional distribution describing P(𝑉 |do(𝑅 = 𝑟 ), 𝑋 ),
with the do-operator following Pearl’s seminal work [31]. These interventions remove the dependency of empirical
views on the deployed ranking policy 𝜋, and thus:
P(𝐶 |do(𝑅), 𝑋 )
P(𝑄 |𝑋 ) = . (2)
P(𝑉 |do(𝑅), 𝑋 )
We visualise this interventional procedure with the Probabilistic Graphical Model (PGM) shown in Figure 1 — this
causal view is often left implicit in related work. The main benefit from this derivation is that it allows us to obtain an
unbiased estimate of item quality from observable quantities alone, but downstream applications of accurate exposure
probabilities are threefold: (1) ranking policies can be evaluated more reliably with offline estimators of online metrics
that depend on exposure [22], (2) ranking policies can be learnt to maximise better objectives by replacing the observed
labels 𝐶 with estimates of quality (essentially de-biasing them) [28], and (3) fairness metrics related to “equity of exposure”
among content creators, for example, can be estimated and optimised more reliably [10, 18].
Note that (1) is essentially what the DCG metric aims to do, by discounting the “gain” at every rank (independent of
context). The standard inverse-logarithmic discount function for DCG is commonly adopted (Pdcg in Eq. 3), and forms
the basis for ranking evaluation in the wider IR and recommendation field. Nevertheless, if the exposure probabilities
implied by this discount function are inaccurate, DCG fails to be an unbiased estimator of online gains. If this is the case,
we can consider other parametric discount functions, with several forms. In line with DCG, it is natural to consider an
inverse logarithmically shaped function (indicating diminishing position bias effects at lower ranks), or an exponential
decay function (in line with the user model adopted by RBP [25]). These families of functions will be the baselines for
our experiments, parameterised by 𝛼 ∈ R+0 and 𝛾 ∈ [0, 1] respectively: 2
1 1
Pdcg (𝑉 = 1|𝑅 = 𝑟 ) = Plog (𝑉 = 1|𝑅 = 𝑟 ) = Pexp (𝑉 = 1|𝑅 = 𝑟 ) = 𝛾 𝑟 −1 . (3)
log2 (𝑟 + 1) ln(𝑒 + 𝛼 (𝑟 − 1))
, ,

In the context of web search applications, regression- and deep neural network (DNN)-based position bias models
have been proposed as well [2, 40]. Whilst effective, there are practical drawbacks to implementing such methods in
real-world applications: (1) it is computationally intensive to obtain P(𝑉 |𝑅) from a forward pass in a neural network,
(2) it requires significant engineering effort to make such models accessible when performing e.g. offline evaluation or
exposure fairness analyses, (3) DNNs do not guarantee robust estimates in low-data regimes and fail to encode desirable
2We interpret them as probabilities, but these discounts were not originally motivated by sound probabilistic models of user behaviour (bar RBP [25]).
3
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen

properties such as monotonicity of P(𝑉 |𝑅) w.r.t. 𝑅, and (4) the black-box nature of these models does not allow us to
obtain an improved understanding of how users interact with the rankings they are presented with. For these reasons,
we place simple models at the focal point of our work, with a minimal amount of learnable parameters.
Finally, note that all of the aforementioned models are independent of contextual information 𝑋 . Fang et al. propose
a DNN-based model whose output reflects P(𝑉 |𝑅, 𝑋 ) instead; adopting a classical Multi-Layer Perceptron (MLP)
architecture while conjecturing that architectural improvements could improve results [11]. Naturally, their method
suffers from the same drawbacks as non-contextual deep models. Wu et al. leverage Gradient-Boosted Decision Trees
for an e-commerce feed recommendation application, with similar concerns for practical implementations [41]. We can
address some of these shortcomings by viewing position bias through a probabilistic lens, as it provides a theoretically
sound basis to reason about exposure, allowing for more sample- and parameter-efficient learning.
Next to the Position-Based Model, alternative models of user behaviour have been proposed in the web search
literature [43]. Such methods require us to model the probability that a user continues down a ranked list, conditional
on the items viewed so far. As this significantly increases modelling complexity, it is out of scope for this short article.

2.2 Probabilistic Position Bias Models


The intuitive logarithmic discount originally proposed by Järvelin and Kekäläinen has been widely adopted in the
research literature [15], along with the exponential form that motivates RBP [25], or cascade-based alternatives [6, 9].
Model-based approaches [2, 11, 40, 41] hold promise to move beyond mere intuition, but their implementation has
several practical drawbacks that hinder widespread adoption. We adopt the Contextual Position-Based Model (CPBM)
proposed by Fang et al. [11], but aim to tackle the position bias estimation problem through a probabilistic lens.
We introduce an additional random variable 𝐷, referring to “scroll depth”. In our hierarchical use-case, users scroll
through ranked posts on the 1st -level feed until they decide to either enter a 2nd -level feed via an item of their liking, or
decide to abandon the feed altogether. A large majority of sessions sees users entering a 2nd -level feed via a highly
ranked post, fewer sessions lead to 2nd -level feeds at lower ranks, and a small minority sees users abandoning the feed
after scrolling further. As such, our discrete scroll depth random variable follows a power-law distribution, such as the
Yule-Simon distribution visualised in Figure 2(a) [34, 42]. This distribution includes a shape parameter 𝜌 ∈ R+0 . When
B(·, ·) represents the Beta function, its Probability Mass Function (PMF) at depth 𝐷 = 𝑑 is given by:

PYule−Simon(𝜌 ) (𝐷 = 𝑑) = 𝜌B(𝑑, 𝜌 + 1). (4)

Having defined a distribution for scroll depth, we can define the relationship between scroll depth and position bias.
Indeed, a post ranked at position 𝑅 = 𝑟 will be viewed if and only if the user decides to scroll at least up until that rank.
This implies a negative relationship with the Cumulative Distribution Function (CDF) of 𝐷:

P(𝑉 = 1|𝑅 = 𝑟 ) = P(𝐷 ≥ 𝑟 ) = 1 − P(𝐷 < 𝑟 ). (5)

That is, the position bias at rank 𝑟 can be derived from the survival function of the scroll depth distribution. For the
Yule-Simon(𝜌)-distribution, this quantity is given by:

1 if 𝑟 = 1,


Pprob (𝑉 = 1|𝑅 = 𝑟 ) = (6)

 (𝑟 − 1)B(𝑟 − 1, 𝜌 + 1)
 otherwise.

4
A Probabilistic Position Bias Model for Short-Video Recommendation Feeds RecSys ’23, September 18–22, 2023, Singapore, Singapore

This derivation gives rise to a probabilistic position bias model that is theoretically sound, and motivated by a plausible
model of user behaviour, specifically tailored to the use-case of feed recommendations. Note that we adopt the Yule-
Simon distribution largely for illustratory purposes in this Section, but that our analysis remains general. As such,
we could adopt more involved probability distributions for scroll depth to reflect position bias curves that would be
unrealisable by Yule-Simon. A promising alternative here is the generalised Waring distribution, as it can reflect a
wider set of position bias curves [23, §6.2.3]. As this probability distribution is defined by multiple parameters, it is
out-of-scope for the purposes of this short article. Even though the probabilistic position bias model proposed so far is
neither contextual nor personalised, we expect the position bias curves emanating from Eq. 6 to be more aligned with
empirical biases in feed recommendation scenarios than those produced by the classical approximations in Eq. 3.
We visualise the position bias curves that emanate from our proposed model in Figure 2(b), contrasting them with
those arising from classical approximations. Indeed, whilst we observe that Pdcg does not discount aggressively enough,
both Plog and Pexp cannot accurately represent similar position bias curves as our probabilistically inspired method.

2.2.1 Connections to Mean Reciprocal Rank (MRR) [39]. A commonly used offline evaluation metric in the web search
domain is MRR, which averages the reciprocal rank of the first relevant document in every ranking. When assuming
binary relevance labels and a single relevant item per ranking, this metric is equivalent to DCG where the discount
corresponds to the reciprocal rank: Prr (𝑉 = 1|𝑅 = 𝑟 ) = 𝑟1 . Recent work in the e-commerce recommendation domain
highlights how this discount function aligns much more closely with the empirical position bias on their platform, and
reports improved alignment between offline and online evaluation metrics [24]. There is an intricate connection between
the reciprocal rank discount and our proposed method, when we consider the Yule-Simon distribution with parameter
Γ (𝑥 )Γ (𝑦)
𝜌 = 1. To see this, recall that the Beta function can be written in terms of Gamma functions as B(𝑥, 𝑦) = Γ (𝑥+𝑦) , and
Γ(𝑥) = (𝑥 − 1)! for positive integers 𝑥. Then, consider for 𝑟 > 1:
Γ(𝑟 − 1)Γ(2) Γ(𝑟 − 1) (𝑟 − 1) 1
Pprob (𝑉 = 1|𝑅 = 𝑟 ) = (𝑟 −1)B(𝑟 −1, 2) = (𝑟 −1) = (𝑟 −1) = = = Prr (𝑉 = 1|𝑅 = 𝑟 ). □
Γ(𝑟 − 1 + 2) Γ(𝑟 + 1) (𝑟 − 1)𝑟 𝑟
(7)
As such, the specific discount function that gives rise to MRR can be seen as a special case of our proposed method,
when we adopt the Yule-Simon distribution with parameter 𝜌 = 1. This is reassuring, given MRR’s strong track record
in IR research and its recent successes [24]. Our probabilistic model for scroll depth gives rise to a theoretically sound
motivation for MRR. Furthermore, the flexibility of our framework allows us to adopt alternative probability distributions
with varying parameterisations, giving rise to a rich family of position bias curves, as shown in Figure 2(b).
Existing work has connected RR-based metrics to cascade user models in web search use-cases with graded relevance
labels [5]. Their derivation still holds when replacing Prr with the more general and parameterisable Pprob [5, Def. 1].

2.2.2 Contextual, Personalised, Probabilistic Position Bias Models: C-3PO. We wish to incorporate contextual information
encoded in 𝑋 , to obtain a conditional probability distribution for the scroll depth: P(𝐷 |𝑋 ). To this end, we aim to learn
a function that maps 𝑋 to the parameters that define 𝐷’s distribution. In the case of Yule-Simon, this means we wish to
learn a parameterised function 𝜌𝜃 (𝑋 ) s.t. the implied conditional probability distribution for the position bias model
maximises the likelihood of observed data D:

(8)
Ö
P(𝑉 = 𝑣 |𝑅 = 𝑟, 𝑋 = 𝑥).
(𝑣,𝑟,𝑥 ) ∈ D

5
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen

(a) Distribution of Scroll Depth (Yule-Simon) (b) Position Bias Estimates


100 100
Pprob
10−1 ρ = 0.50
ρ = 0.70 10−1
P(D = d)

P(V = 1|R = r)
10−2 ρ = 0.90
ρ = 1.10
10−2
10−3 ρ = 1.30
ρ = 1.50
10−4 10−3
PMF

ρ = 1.70
ρ = 1.90 Pdcg
10−5 ρ = 2.10 10−4 Plog , α = 102
ρ = 2.30 Pexp , γ = 0.9
10−6
100 101 102 0 20 40 60 80 100
Scroll Depth d Rank r
Fig. 2. Visualising the distributions induced by our proposed probabilistically inspired position bias models.
Left (a): the Probability Mass Function (PMF) for the discrete Yule-Simon distribution with varying shape parameter 𝜌, at different
scroll depth levels. By personalising the parameter 𝜌, we can express a wide breadth of distributions that represent scrolling behaviour.
Right (b): the position bias curves that emanate from our probabilistically inspired position bias model, with varying shape parameter
𝜌, along with common traditional methods. Our proposed model captures scrolling behaviour that is different to existing methods.
As is common, we minimise negative log-likelihood (NLL) instead:
  1  
log b (9)
∑︁
ℓ b P; D = − P(𝑉 = 𝑣 |𝑅 = 𝑟, 𝑋 = 𝑥) ,
|D|
(𝑣,𝑟,𝑥 ) ∈ D

where b
P denotes the estimated position bias model. For the Yule-Simon distribution, we plug in the position bias formula
from Eq. 6 with the learnt function 𝜌𝜃 (𝑥) to obtain:

1 if 𝑟 = 1,


Pprob (𝑉 = 1|𝑅 = 𝑟, 𝑋 = 𝑥) = (10)

b
 (𝑟 − 1)B(𝑟 − 1, 𝜌𝜃 (𝑥) + 1)
 otherwise.

Using standard gradient descent techniques, we can now aim to learn the optimal parameters 𝜃 for 𝜌𝜃 (𝑥) that minimise
the NLL in Eq. 9. Note that we can introduce rank-based weights or cut-offs to, for example, up-weight higher positions
in the list. When the position bias model has limited capacity, this approach can help to optimise the accuracy of the
model in typical top-𝐾 scenarios. Indeed, the lower exposure probabilities at the bottom of the rankings are easier to
model, and this can drown out the importance of the higher positions. We denote such cut-offs with NLL@𝐾.
Finally, note that the optimisation procedure derived in this Subsection does not solely restrict itself to our proba-
bilistically motivated position bias model. Indeed, we can learn the parameters for the logarithmic and exponential
forms in Eq. 3 that minimise the NLL of the resulting distributions just as well, which we do in our experiments.

3 EXPERIMENTAL RESULTS & DISCUSSION


We wish to answer the following research questions experimentally:

RQ1 Is our proposed probabilistic method able to model exposure probabilities more accurately than existing methods?
RQ2 Can the model leverage contextual signals effectively?
RQ3 Are the obtained position biases useful for downstream tasks, such as unbiased offline evaluation?

Naturally, position biases are heavily influenced by specific use-cases, platforms and interface choices. The methods
we propose in this work are motivated by a short-video feed recommendation use-case, and even though our proposed
framework is generally applicable, we expect the Yule-Simon instantiation to only hold merit in similar use-cases.
6
A Probabilistic Position Bias Model for Short-Video Recommendation Feeds RecSys ’23, September 18–22, 2023, Singapore, Singapore

In order to empirically validate the performance of both our and earlier proposed methods, we require interventional
data with logged views 𝑉 , ranks 𝑅, and contexts 𝑋 . To the best of our knowledge and at the time of writing, we are
unaware of any such datasets being publicly available. Existing Learning-to-Rank (LTR) datasets do not contain rank
interventions and deal with web search use-cases, which imply very different modalities to ours. For this reason, we
need to resort to proprietary datasets, but additionally release an open-source Jupyter notebook to reproduce the
position bias curves visualised in Figure 2 at github.com/olivierjeunen/C-3PO-recsys-2023.

3.1 Estimating Exposure Probabilities (RQ1–2)


We obtain a sample of 1 million sessions of feed view events on a social media platform, where rank interventions
occurred following Fig. 1, collected over five days in February 2023. We perform an 80-20% train-test split, aiming to
predict whether recommendations were viewed based on their rank and contextual information.
We compare several non-contextual variants: the standard DCG discount function as well as the logarithmic and
exponential forms in Eq. 3, and the probabilistic method based on the Yule-Simon distribution introduced in Eq. 6.
The latter three methods include a single parameter (𝛼, 𝛾, 𝜌 respectively), which we learn to minimise NLL@𝐾 on the
training set, following the procedure laid out in §2.2.2. We implement this in Python 3.9 with the SciPy library [38].
As an additional baseline, we include a non-parametric method that predicts the empirical average from the training
data. This approach should be expected to outperform the aforementioned methods, but it requires a hard-coded
probability at every rank instead of the single parameter that the logarithmic, exponential, or probabilistic forms require.
Additionally, this approach cannot easily be extended to incorporate contextual information 𝑋 .
For the contextual case, we adopt a single continuous user-based feature describing users’ past average scroll depth,
as well as a single continuous context-based feature, describing average scroll depth at the time of day. We adopt
a simple linear model to estimate the distribution parameter from this input 𝑋 : 𝜌𝜃 (𝑥) = 𝜃 ⊺ 𝑥. The functional forms
for the parameters 𝛼 and 𝛾 are analogous. As such, the contextual and personalised methods consist of only three
parameters each (assuming 𝑥 includes a constant 1-feature, emulating a bias term in 𝜃 ). Even in this simplistic scenario,
the contextual and personalised methods significantly outperform those that do not consider this information, as shown
in Table 1. Our contextual, personalised, probabilistic position bias model C-3PO achieves the lowest NLL@𝐾 for a
wide range of 𝐾, whilst requiring a minimum of learnable parameters or computing resources. This yields a desirable
trade-off between parsimony and model expressiveness when compared to complex model classes like neural networks
(which would typically require orders of magnitude more parameters). We observe that this additionally allows us to
be sample-efficient, as our method already performs well with only O (104 ) samples. Indeed, instead of modelling the
entire curve at every possible value of 𝑟 , our proposed method outputs a single scalar which can be used to obtain
position bias estimates for all natural numbers. The inductive bias we enjoy from well-motivated mathematical models
greatly improves the methods’ real-world usability, when compared to neural-network based alternatives.

3.2 Unbiased Offline Evaluation (RQ3)


The main task position bias models need to perform, is to deliver offline estimates of online performance. Given a
dataset of logged impressions D B {(𝑥𝑖 , 𝑎𝑖 , 𝑟𝑖 , 𝑐𝑖 )𝑖=1
𝑁 } (contexts, actions, ranks, rewards) , we wish to estimate the

expected reward we would have obtained under some different ranking policy 𝜋. This policy maps a context 𝑋 and set
of candidate items A𝑥 to a ranked list. We will denote with the shorthand notation 𝜋 (𝑎|𝑥) the rank that item 𝑎 will
be placed at when 𝜋 is presented with context 𝑥 (assuming A𝑥 given). Note that this framing is easily extended to
more general stochastic ranking policies [26]. Then, a dataset D and position bias model b
P can be used to to obtain an
7
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen

Negative Log-Likelihood (NLL)


Model
@5 @10 @25 @50 @100
Pdcg (𝑉 |𝑅)
b 0.5453 0.6320 0.5998 0.4973 0.3763
Plog (𝑉 |𝑅)
b 0.5159 0.6001 0.5900 0.5036 0.3833
Pexp (𝑉 |𝑅)
b 0.5202 0.6158 0.6089 0.5101 0.3673
Pprob (𝑉 |𝑅)
b 0.5162 0.6002 0.5873 0.4891 0.3495

Pempirical (𝑉 |𝑅)
b 0.5157 0.5999 0.5843 0.4813 0.3369

Plog (𝑉 |𝑅, 𝑋 )
b 0.4852 0.5620 0.5577 0.4806 0.3555
Pexp (𝑉 |𝑅, 𝑋 )
b 0.4883 0.5761 0.5778 0.4959 0.3652
Pprob (𝑉 |𝑅, 𝑋 )
b 0.4850 0.5620 0.5551 0.4651 0.3325
Table 1. NLL for position bias models on observed data, lower is better. The top-group are independent of contextual information, the
middle baseline is a non-parametric method that predicts a sample average, the bottom-group include three parameters that were
optimised via linear regression. Marked fields indicate stat. sig. improvements over other methods in the same group at a 99% level.

Position Bias Model Pdcg (𝑉 |𝑅)


b Plog (𝑉 |𝑅)
b Pexp (𝑉 |𝑅)
b Pprob (𝑉 |𝑅)
b
Relative Improvement 100% -3% -20% +16%
Table 2. Relative correlation improvement over b
Pdcg between DCG estimates and online metrics, higher is better.

unbiased estimate of the reward we would obtain under 𝜋:


2 1 ∑︁ P(𝑉 = 1|𝑅 = 𝜋 (𝑎𝑖 |𝑥𝑖 ), 𝑋 = 𝑥𝑖 )
𝑁
1
E [𝐶] ≈ DCGbP (D, 𝜋) ≈ (11)
b
𝑐𝑖 · .
𝑟 ∼𝜋 𝑁 𝑖=1 P(𝑉 = 1|𝑅 = 𝑟𝑖 , 𝑋 = 𝑥𝑖 )
b
1
Here, the first approximation ≈ is due to the inherent assumptions of the DCG metric (compared to, e.g., cascade-based
alternatives), whereas the second only exists because we resort to an empirical average over the observed data D
and estimated position biases via b
P. Assuming unbiasedness of bP, the unbiasedness of the metric in Eq. 11 is easily
recognised, as it is an application of importance sampling or IPS [30]. As is typical for IPS-based methods, techniques
like capping or introducing control variates can improve their finite-sample performance by reducing variance [13, 35].
We do not consider such extensions in this short article, but remark that they are likely to further improve performance.
To validate the utility of these offline estimates, we perform an online experiment on a social media platform that
operates a short-video recommendation feed. Thus, we obtain samples from the reward distribution by sampling
E𝑟 ∼𝜋 [𝐶] directly and taking an empirical average per day, for five days. Then, for varying context-independent position
bias models (optimised @100), we obtain offline estimates of online reward via Eq. 11, and evaluate the offline estimates
by Pearson’s correlation coefficient between the ground truth and the offline estimate, over 5 days.
Table 2 shows relative improvements in correlation over the classical DCG formulation. We observe that our
probabilistically motivated position bias model is able to significantly improve the offline-online correlation compared to
existing methods, and conjecture that the context-dependent variant can lead to further improvements. This highlights
the importance of a well-motivated position bias model, and is a strong argument in favour of our proposed methods.

4 CONCLUSIONS & OUTLOOK


In this work, we have argued the value of a probabilistically motivated position bias model to accurately estimate
exposure probabilities with a minimal number of parameters. We have presented a specific instantiation of this general
idea, modelling scroll depth via the Yule-Simon distribution, and leveraging its survival function to obtain closed-form
position bias estimates. A general approach to learn distribution parameters conditional on contextual information via
8
A Probabilistic Position Bias Model for Short-Video Recommendation Feeds RecSys ’23, September 18–22, 2023, Singapore, Singapore

Maximum Likelihood Estimation, amenable to gradient descent, allows for flexible and efficient optimisation. Using real-
world data from a social media platform, we have empirically validated that our novel methods model empirical exposure
significantly more accurately than competing methods, and that the obtained estimates pave the way for improvements
in unbiased offline evaluation. In future work, we wish to extend our approach to incorporate cascade-based alternatives.

REFERENCES
[1] A. Agarwal, I. Zaitsev, X. Wang, C. Li, M. Najork, and T. Joachims. 2019. Estimating Position Bias without Intrusive Interventions. In Proc. of the
Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). ACM, 474–482. https://doi.org/10.1145/3289600.3291017
[2] Q. Ai, K. Bi, C. Luo, J. Guo, and W. B. Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In The 41st International ACM
SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’18). ACM, 385–394. https://doi.org/10.1145/3209978.3209986
[3] J. Beel, M. Genzmehr, S. Langer, A. Nürnberger, and B. Gipp. 2013. A Comparative Analysis of Offline and Online Evaluations and Discussion of
Research Paper Recommender System Evaluation. In Proc. of the International Workshop on Reproducibility and Replication in Recommender Systems
Evaluation (RepSys ’13). 7–14.
[4] P. Castells and A. Moffat. 2022. Offline recommender system evaluation: Challenges and new directions. AI Magazine 43, 2 (2022), 225–238.
https://doi.org/10.1002/aaai.12051
[5] O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. of the 18th ACM Conference on
Information and Knowledge Management (CIKM ’09). ACM, 621–630. https://doi.org/10.1145/1645953.1646033
[6] O. Chapelle and Y. Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proc. of the 18th International Conference on
World Wide Web (WWW ’09). ACM, 1–10. https://doi.org/10.1145/1526709.1526711
[7] J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He. 2022. Bias and Debias in Recommender System: A Survey and Future Directions. ACM
Trans. Inf. Syst. (oct 2022). https://doi.org/10.1145/3564284 Just Accepted.
[8] A. Chuklin, I. Markov, and M. de Rijke. 2015. Click Models for Web Search. Morgan & Claypool. https://doi.org/10.2200/S00654ED1V01Y201507ICR043
[9] N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In Proc. of the 2008 International
Conference on Web Search and Data Mining (WSDM ’08). ACM, 87–94. https://doi.org/10.1145/1341531.1341545
[10] F. Diaz, B. Mitra, M. D. Ekstrand, A. J. Biega, and B. Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. In Proc. of the 29th
ACM International Conference on Information & Knowledge Management (CIKM ’20). ACM, 275–284. https://doi.org/10.1145/3340531.3411962
[11] Z. Fang, A. Agarwal, and T. Joachims. 2019. Intervention Harvesting for Context-Dependent Examination-Bias Estimation. In Proc. of the 42nd
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 825–834. https://doi.org/10.1145/
3331184.3331238
[12] F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. 2014. Offline and Online Evaluation of News Recommender Systems at
Swissinfo.Ch. In Proc. of the 8th ACM Conference on Recommender Systems (RecSys ’14). 169–176. https://doi.org/10.1145/2645710.2645745
[13] A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proc. of the Eleventh ACM
International Conference on Web Search and Data Mining (WSDM ’18). ACM, 198–206. https://doi.org/10.1145/3159652.3159687
[14] K. Hofmann, A. Schuth, A. Bellogín, and M. de Rijke. 2014. Effects of Position Bias on Click-Based Recommender Evaluation. In Advances in
Information Retrieval. Springer International Publishing, Cham, 624–630.
[15] K. Järvelin and J. Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (oct 2002), 422–446. https:
//doi.org/10.1145/582415.582418
[16] O. Jeunen. 2019. Revisiting Offline Evaluation for Implicit-feedback Recommender Systems. In Proc. of the 13th ACM Conference on Recommender
Systems (RecSys ’19). ACM, 596–600. https://doi.org/10.1145/3298689.3347069
[17] O. Jeunen. 2021. Offline Approaches to Recommendation with Online Success. Ph. D. Dissertation. University of Antwerp.
[18] O. Jeunen and B. Goethals. 2021. Top-K Contextual Bandits with Equity of Exposure. In Proc. of the 15th ACM Conference on Recommender Systems
(RecSys ’21). ACM, 310–320. https://doi.org/10.1145/3460231.3474248
[19] O. Jeunen, K. Verstrepen, and B. Goethals. 2018. Fair Offline Evaluation Methodologies for Implicit-feedback Recommender Systems with MNAR
Data. In Proc. of the REVEAL 18 Workshop on Offline Evaluation for Recommender Systems (RecSys ’18).
[20] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. 2005. Accurately Interpreting Clickthrough Data As Implicit Feedback. In Proc. of
the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’05). ACM, 154–161. https:
//doi.org/10.1145/1076034.1076063
[21] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. 2007. Evaluating the Accuracy of Implicit Feedback from Clicks and Query
Reformulations in Web Search. ACM Trans. Inf. Syst. 25, 2 (apr 2007), 7–es. https://doi.org/10.1145/1229179.1229181
[22] T. Joachims, A. Swaminathan, and T. Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the Tenth ACM International
Conference on Web Search and Data Mining (WSDM ’17). ACM, 781–789. https://doi.org/10.1145/3018661.3018699
[23] N. L. Johnson, A. W. Kemp, and S. Kotz. 2005. Univariate discrete distributions. Vol. 444. John Wiley & Sons.
[24] M. J. Mei, C. Zuber, and Y. Khazaeni. 2022. A Lightweight Transformer for Next-Item Product Recommendation. In Proc. of the 16th ACM Conference
on Recommender Systems (RecSys ’22). ACM, 546–549. https://doi.org/10.1145/3523227.3547491
9
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen

[25] A. Moffat and J. Zobel. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (dec 2008),
27 pages. https://doi.org/10.1145/1416950.1416952
[26] H. Oosterhuis. 2021. Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness. In Proc. of the 44th
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). ACM, 1023–1032. https://doi.org/10.1145/
3404835.3462830
[27] H. Oosterhuis. 2023. Doubly Robust Estimation for Correcting Position Bias in Click Feedback for Unbiased Learning to Rank. ACM Trans. Inf. Syst.
41, 3, Article 61 (feb 2023), 33 pages. https://doi.org/10.1145/3569453
[28] H. Oosterhuis and M. de Rijke. 2020. Policy-Aware Unbiased Learning to Rank for Top-k Rankings. In Proc. of the 43rd International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR ’20). ACM, 489–498. https://doi.org/10.1145/3397271.3401102
[29] Z. Ovaisi, R. Ahsan, Y. Zhang, K. Vasilaky, and E. Zheleva. 2020. Correcting for Selection Bias in Learning-to-Rank Systems. In Proc. of The Web
Conference 2020 (WWW ’20). ACM, 1863–1873. https://doi.org/10.1145/3366423.3380255
[30] A. B. Owen. 2013. Monte Carlo theory, methods and examples.
[31] J. Pearl. 2009. Causality. Cambridge university press.
[32] M. Rossetti, F. Stella, and M. Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proc. of the
10th ACM Conference on Recommender Systems (RecSys ’16). ACM, 31–34. https://doi.org/10.1145/2959100.2959176
[33] M. Ruffini, V. Bellini, A. Buchholz, G. Di Benedetto, and Y. Stein. 2022. Modeling Position Bias Ranking for Streaming Media Services. In Companion
Proc. of the Web Conference 2022 (WWW ’22). ACM, 72–76. https://doi.org/10.1145/3487553.3524210
[34] H. A. Simon. 1955. On a Class of Skew Distribution Functions. Biometrika 42, 3-4 (12 1955), 425–440. https://doi.org/10.1093/biomet/42.3-4.425
[35] A. Swaminathan and T. Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing
Systems. 3231–3239. https://proceedings.neurips.cc/paper/2015/file/39027dfad5138c9ca0c474d71db915c3-Paper.pdf
[36] D. Valcarce, A. Bellogín, J. Parapar, and P. Castells. 2020. Assessing ranking metrics in top-N recommendation. Information Retrieval Journal 23, 4
(01 Aug 2020), 411–448. https://doi.org/10.1007/s10791-020-09377-x
[37] A. Vardasbi, H. Oosterhuis, and M. de Rijke. 2020. When Inverse Propensity Scoring Does Not Work: Affine Corrections for Unbiased Learning
to Rank. In Proc. of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). ACM, 1475–1484. https:
//doi.org/10.1145/3340531.3412031
[38] P. Virtanen, R. Gommers, T. E. Oliphant, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 3
(2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
[39] E. M. Voorhees et al. 1999. The TREC-8 question answering track report. In TREC, Vol. 99. 77–82.
[40] X. Wang, N. Golbandi, M. Bendersky, D. Metzler, and M. Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search.
In Proc. of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM ’18). ACM, 610–618. https://doi.org/10.1145/3159652.
3159732
[41] X. Wu, H. Chen, J. Zhao, L. He, D. Yin, and Y. Chang. 2021. Unbiased Learning to Rank in Feeds Recommendation. In Proc. of the 14th ACM
International Conference on Web Search and Data Mining (WSDM ’21). ACM, 490–498. https://doi.org/10.1145/3437963.3441751
[42] G. U. Yule. 1925. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philosophical Transactions of the Royal
Society of London. Series B, Containing Papers of a Biological Character 213, 402-410 (1925), 21–87. https://doi.org/10.1098/rstb.1925.0002
[43] F. Zhang, Y. Liu, J. Mao, M. Zhang, and S. Ma. 2020. User behavior modeling for Web search evaluation. AI Open 1 (2020), 40–56. https:
//doi.org/10.1016/j.aiopen.2021.02.003

10

You might also like