Probabilistic Position Bias
Probabilistic Position Bias
to leave the feed after, for example, liking a post. Indeed, seemingly infinite feeds invite users to scroll further down the ranked list. For
this reason, existing position bias or user models tend to fall short in such settings, as they do not accurately capture users’ interaction
modalities. In this work, we propose a novel and probabilistically sound personalised position bias model for feed recommendations.
We focus on a 1st -level feed in a hierarchical structure, where users may enter a 2nd -level feed via any given 1st -level item. We posit that
users come to the platform with a given scrolling budget that is drawn according to a discrete power-law distribution, and show how
the survival function of said distribution can be used to obtain closed-form estimates for personalised exposure probabilities. Empirical
insights gained through data from a large-scale social media platform show how our probabilistic position bias model more accurately
captures empirical exposure than existing models, and paves the way for improved unbiased evaluation and learning-to-rank.
CCS Concepts: • Information systems → Specialized information retrieval; Recommender systems; Evaluation of retrieval
results; • Computing methodologies → Learning in probabilistic graphical models.
Additional Key Words and Phrases: Probabilistic Modelling; Position Bias; Mean Reciprocal Rank
1
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen
transferable to recommendation domains, when we replace queries with users and documents with items. As a result,
evaluation metrics such as Normalised Discounted Cumulative Gain (nDCG) are a common choice when assessing top-𝑛
recommendation quality [36]. An often overlooked point is that the discount is directly related to position bias, and that
well-chosen discount functions are necessary to consider nDCG an unbiased offline estimator of online reward [17]. This
is a desirable feat, as discrepancies between off- and on-line evaluation results have plagued the recommender systems
field for years [3, 12, 13, 16, 19, 32]. Models of user behaviour, such as those underlying the rank-biased precision
(RBP) [25] or expected reciprocal rank (ERR) [5] metrics, can be used to construct discount functions for nDCG-like
metrics that emulate the empirical position bias in a given system well. Existing work in this area has largely focused
on web search [8], with extensions to general recommendation use-cases in e-commerce [24].
Short-video feeds on social media platforms, however, imply very different interaction paradigms than those prevalent
in web search. Indeed, users are unlikely to abandon the feed after, for example, liking a post. There is no information,
but rather an entertainment need for users scrolling the feed. As such, user models that are prevalent in other application
areas are not directly applicable to our use-case [43]. Aside from more general work by Wu et al. [41], this topic has
received relatively little research attention. We specifically focus on a 1st -level feed in a hierarchical structure, where
users can either keep scrolling the current feed, or enter a “more-like-this” 2nd -level feed via any 1st -level item. Indeed,
such user interfaces have gained popularity recently, and can be found on Reddit, Instagram and ShareChat, among
others. Our hypothesis is that users come to the platform with a “scrolling budget”, reflecting how far they are willing to
scroll before abandoning the feed. This budget is personalised, context-dependent, and drawn from a discrete power-law
distribution such as the Yule-Simon distribution with shape parameter 𝜌 [34, 42]. Figure 2(a) visualises how this family
of distributions can represent a wide variety of stochastic budgets and, hence, scrolling behaviours.
We show how the survival function of this distribution can be used to obtain closed-form estimates for personalised
exposure probabilities that have a sound theoretical basis, and show how they pave the way for improved unbiased
evaluation and learning-to-rank in feed recommendation settings.
The main contributions we present in this paper are the following:
(1) We propose a novel Contextual, Personalised, Probabilistic POsition bias model for feed recommendations: C-3PO.
(2) We empirically validate using real-world data that C-3PO is better able to capture exposure probabilities than
existing methods, whilst having a stronger theoretical basis.
(3) We show how C-3PO can be used for improved unbiased evaluation and learning in feed ranking scenarios.
𝜋 𝑋
do(𝑅 = 𝑟 ) 𝑅 𝑉 𝐶 𝑄
Fig. 1. Probabilistic Graphical Model (PGM) detailing how interventional data collection allows for unbiased estimation of the causal
effect between the rank 𝑅 and view events 𝑉 , by removing incoming causal edges from the deployed ranking policy 𝜋 .
This is a well-known result, showing how we can obtain unbiased estimates of quality conditional on context, by
reweighting observed clicks with exposure or viewing probabilities (i.e. inverse propensity scoring, or IPS [30, Ch. 9]). In
this work, we do not focus on selection bias, i.e. bias stemming from the ranking policy 𝜋 influencing the rankings [28, 29].
Instead, we wish to estimate the causal effect of rank on exposure, conditional on context.
Without considering contextual information, Joachims et al. originally proposed to perform randomised interventions
on the rank of a given item to obtain such estimates [22]. Further extensions proposed to leverage historical interventions
stemming from natural experiments [1], and this was further extended to be context-dependent [11]. Whichever of
these methods is adopted, we essentially obtain data from the interventional distribution describing P(𝑉 |do(𝑅 = 𝑟 ), 𝑋 ),
with the do-operator following Pearl’s seminal work [31]. These interventions remove the dependency of empirical
views on the deployed ranking policy 𝜋, and thus:
P(𝐶 |do(𝑅), 𝑋 )
P(𝑄 |𝑋 ) = . (2)
P(𝑉 |do(𝑅), 𝑋 )
We visualise this interventional procedure with the Probabilistic Graphical Model (PGM) shown in Figure 1 — this
causal view is often left implicit in related work. The main benefit from this derivation is that it allows us to obtain an
unbiased estimate of item quality from observable quantities alone, but downstream applications of accurate exposure
probabilities are threefold: (1) ranking policies can be evaluated more reliably with offline estimators of online metrics
that depend on exposure [22], (2) ranking policies can be learnt to maximise better objectives by replacing the observed
labels 𝐶 with estimates of quality (essentially de-biasing them) [28], and (3) fairness metrics related to “equity of exposure”
among content creators, for example, can be estimated and optimised more reliably [10, 18].
Note that (1) is essentially what the DCG metric aims to do, by discounting the “gain” at every rank (independent of
context). The standard inverse-logarithmic discount function for DCG is commonly adopted (Pdcg in Eq. 3), and forms
the basis for ranking evaluation in the wider IR and recommendation field. Nevertheless, if the exposure probabilities
implied by this discount function are inaccurate, DCG fails to be an unbiased estimator of online gains. If this is the case,
we can consider other parametric discount functions, with several forms. In line with DCG, it is natural to consider an
inverse logarithmically shaped function (indicating diminishing position bias effects at lower ranks), or an exponential
decay function (in line with the user model adopted by RBP [25]). These families of functions will be the baselines for
our experiments, parameterised by 𝛼 ∈ R+0 and 𝛾 ∈ [0, 1] respectively: 2
1 1
Pdcg (𝑉 = 1|𝑅 = 𝑟 ) = Plog (𝑉 = 1|𝑅 = 𝑟 ) = Pexp (𝑉 = 1|𝑅 = 𝑟 ) = 𝛾 𝑟 −1 . (3)
log2 (𝑟 + 1) ln(𝑒 + 𝛼 (𝑟 − 1))
, ,
In the context of web search applications, regression- and deep neural network (DNN)-based position bias models
have been proposed as well [2, 40]. Whilst effective, there are practical drawbacks to implementing such methods in
real-world applications: (1) it is computationally intensive to obtain P(𝑉 |𝑅) from a forward pass in a neural network,
(2) it requires significant engineering effort to make such models accessible when performing e.g. offline evaluation or
exposure fairness analyses, (3) DNNs do not guarantee robust estimates in low-data regimes and fail to encode desirable
2We interpret them as probabilities, but these discounts were not originally motivated by sound probabilistic models of user behaviour (bar RBP [25]).
3
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen
properties such as monotonicity of P(𝑉 |𝑅) w.r.t. 𝑅, and (4) the black-box nature of these models does not allow us to
obtain an improved understanding of how users interact with the rankings they are presented with. For these reasons,
we place simple models at the focal point of our work, with a minimal amount of learnable parameters.
Finally, note that all of the aforementioned models are independent of contextual information 𝑋 . Fang et al. propose
a DNN-based model whose output reflects P(𝑉 |𝑅, 𝑋 ) instead; adopting a classical Multi-Layer Perceptron (MLP)
architecture while conjecturing that architectural improvements could improve results [11]. Naturally, their method
suffers from the same drawbacks as non-contextual deep models. Wu et al. leverage Gradient-Boosted Decision Trees
for an e-commerce feed recommendation application, with similar concerns for practical implementations [41]. We can
address some of these shortcomings by viewing position bias through a probabilistic lens, as it provides a theoretically
sound basis to reason about exposure, allowing for more sample- and parameter-efficient learning.
Next to the Position-Based Model, alternative models of user behaviour have been proposed in the web search
literature [43]. Such methods require us to model the probability that a user continues down a ranked list, conditional
on the items viewed so far. As this significantly increases modelling complexity, it is out of scope for this short article.
Having defined a distribution for scroll depth, we can define the relationship between scroll depth and position bias.
Indeed, a post ranked at position 𝑅 = 𝑟 will be viewed if and only if the user decides to scroll at least up until that rank.
This implies a negative relationship with the Cumulative Distribution Function (CDF) of 𝐷:
That is, the position bias at rank 𝑟 can be derived from the survival function of the scroll depth distribution. For the
Yule-Simon(𝜌)-distribution, this quantity is given by:
1 if 𝑟 = 1,
Pprob (𝑉 = 1|𝑅 = 𝑟 ) = (6)
(𝑟 − 1)B(𝑟 − 1, 𝜌 + 1)
otherwise.
4
A Probabilistic Position Bias Model for Short-Video Recommendation Feeds RecSys ’23, September 18–22, 2023, Singapore, Singapore
This derivation gives rise to a probabilistic position bias model that is theoretically sound, and motivated by a plausible
model of user behaviour, specifically tailored to the use-case of feed recommendations. Note that we adopt the Yule-
Simon distribution largely for illustratory purposes in this Section, but that our analysis remains general. As such,
we could adopt more involved probability distributions for scroll depth to reflect position bias curves that would be
unrealisable by Yule-Simon. A promising alternative here is the generalised Waring distribution, as it can reflect a
wider set of position bias curves [23, §6.2.3]. As this probability distribution is defined by multiple parameters, it is
out-of-scope for the purposes of this short article. Even though the probabilistic position bias model proposed so far is
neither contextual nor personalised, we expect the position bias curves emanating from Eq. 6 to be more aligned with
empirical biases in feed recommendation scenarios than those produced by the classical approximations in Eq. 3.
We visualise the position bias curves that emanate from our proposed model in Figure 2(b), contrasting them with
those arising from classical approximations. Indeed, whilst we observe that Pdcg does not discount aggressively enough,
both Plog and Pexp cannot accurately represent similar position bias curves as our probabilistically inspired method.
2.2.1 Connections to Mean Reciprocal Rank (MRR) [39]. A commonly used offline evaluation metric in the web search
domain is MRR, which averages the reciprocal rank of the first relevant document in every ranking. When assuming
binary relevance labels and a single relevant item per ranking, this metric is equivalent to DCG where the discount
corresponds to the reciprocal rank: Prr (𝑉 = 1|𝑅 = 𝑟 ) = 𝑟1 . Recent work in the e-commerce recommendation domain
highlights how this discount function aligns much more closely with the empirical position bias on their platform, and
reports improved alignment between offline and online evaluation metrics [24]. There is an intricate connection between
the reciprocal rank discount and our proposed method, when we consider the Yule-Simon distribution with parameter
Γ (𝑥 )Γ (𝑦)
𝜌 = 1. To see this, recall that the Beta function can be written in terms of Gamma functions as B(𝑥, 𝑦) = Γ (𝑥+𝑦) , and
Γ(𝑥) = (𝑥 − 1)! for positive integers 𝑥. Then, consider for 𝑟 > 1:
Γ(𝑟 − 1)Γ(2) Γ(𝑟 − 1) (𝑟 − 1) 1
Pprob (𝑉 = 1|𝑅 = 𝑟 ) = (𝑟 −1)B(𝑟 −1, 2) = (𝑟 −1) = (𝑟 −1) = = = Prr (𝑉 = 1|𝑅 = 𝑟 ). □
Γ(𝑟 − 1 + 2) Γ(𝑟 + 1) (𝑟 − 1)𝑟 𝑟
(7)
As such, the specific discount function that gives rise to MRR can be seen as a special case of our proposed method,
when we adopt the Yule-Simon distribution with parameter 𝜌 = 1. This is reassuring, given MRR’s strong track record
in IR research and its recent successes [24]. Our probabilistic model for scroll depth gives rise to a theoretically sound
motivation for MRR. Furthermore, the flexibility of our framework allows us to adopt alternative probability distributions
with varying parameterisations, giving rise to a rich family of position bias curves, as shown in Figure 2(b).
Existing work has connected RR-based metrics to cascade user models in web search use-cases with graded relevance
labels [5]. Their derivation still holds when replacing Prr with the more general and parameterisable Pprob [5, Def. 1].
2.2.2 Contextual, Personalised, Probabilistic Position Bias Models: C-3PO. We wish to incorporate contextual information
encoded in 𝑋 , to obtain a conditional probability distribution for the scroll depth: P(𝐷 |𝑋 ). To this end, we aim to learn
a function that maps 𝑋 to the parameters that define 𝐷’s distribution. In the case of Yule-Simon, this means we wish to
learn a parameterised function 𝜌𝜃 (𝑋 ) s.t. the implied conditional probability distribution for the position bias model
maximises the likelihood of observed data D:
(8)
Ö
P(𝑉 = 𝑣 |𝑅 = 𝑟, 𝑋 = 𝑥).
(𝑣,𝑟,𝑥 ) ∈ D
5
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen
P(V = 1|R = r)
10−2 ρ = 0.90
ρ = 1.10
10−2
10−3 ρ = 1.30
ρ = 1.50
10−4 10−3
PMF
ρ = 1.70
ρ = 1.90 Pdcg
10−5 ρ = 2.10 10−4 Plog , α = 102
ρ = 2.30 Pexp , γ = 0.9
10−6
100 101 102 0 20 40 60 80 100
Scroll Depth d Rank r
Fig. 2. Visualising the distributions induced by our proposed probabilistically inspired position bias models.
Left (a): the Probability Mass Function (PMF) for the discrete Yule-Simon distribution with varying shape parameter 𝜌, at different
scroll depth levels. By personalising the parameter 𝜌, we can express a wide breadth of distributions that represent scrolling behaviour.
Right (b): the position bias curves that emanate from our probabilistically inspired position bias model, with varying shape parameter
𝜌, along with common traditional methods. Our proposed model captures scrolling behaviour that is different to existing methods.
As is common, we minimise negative log-likelihood (NLL) instead:
1
log b (9)
∑︁
ℓ b P; D = − P(𝑉 = 𝑣 |𝑅 = 𝑟, 𝑋 = 𝑥) ,
|D|
(𝑣,𝑟,𝑥 ) ∈ D
where b
P denotes the estimated position bias model. For the Yule-Simon distribution, we plug in the position bias formula
from Eq. 6 with the learnt function 𝜌𝜃 (𝑥) to obtain:
1 if 𝑟 = 1,
Pprob (𝑉 = 1|𝑅 = 𝑟, 𝑋 = 𝑥) = (10)
b
(𝑟 − 1)B(𝑟 − 1, 𝜌𝜃 (𝑥) + 1)
otherwise.
Using standard gradient descent techniques, we can now aim to learn the optimal parameters 𝜃 for 𝜌𝜃 (𝑥) that minimise
the NLL in Eq. 9. Note that we can introduce rank-based weights or cut-offs to, for example, up-weight higher positions
in the list. When the position bias model has limited capacity, this approach can help to optimise the accuracy of the
model in typical top-𝐾 scenarios. Indeed, the lower exposure probabilities at the bottom of the rankings are easier to
model, and this can drown out the importance of the higher positions. We denote such cut-offs with NLL@𝐾.
Finally, note that the optimisation procedure derived in this Subsection does not solely restrict itself to our proba-
bilistically motivated position bias model. Indeed, we can learn the parameters for the logarithmic and exponential
forms in Eq. 3 that minimise the NLL of the resulting distributions just as well, which we do in our experiments.
RQ1 Is our proposed probabilistic method able to model exposure probabilities more accurately than existing methods?
RQ2 Can the model leverage contextual signals effectively?
RQ3 Are the obtained position biases useful for downstream tasks, such as unbiased offline evaluation?
Naturally, position biases are heavily influenced by specific use-cases, platforms and interface choices. The methods
we propose in this work are motivated by a short-video feed recommendation use-case, and even though our proposed
framework is generally applicable, we expect the Yule-Simon instantiation to only hold merit in similar use-cases.
6
A Probabilistic Position Bias Model for Short-Video Recommendation Feeds RecSys ’23, September 18–22, 2023, Singapore, Singapore
In order to empirically validate the performance of both our and earlier proposed methods, we require interventional
data with logged views 𝑉 , ranks 𝑅, and contexts 𝑋 . To the best of our knowledge and at the time of writing, we are
unaware of any such datasets being publicly available. Existing Learning-to-Rank (LTR) datasets do not contain rank
interventions and deal with web search use-cases, which imply very different modalities to ours. For this reason, we
need to resort to proprietary datasets, but additionally release an open-source Jupyter notebook to reproduce the
position bias curves visualised in Figure 2 at github.com/olivierjeunen/C-3PO-recsys-2023.
expected reward we would have obtained under some different ranking policy 𝜋. This policy maps a context 𝑋 and set
of candidate items A𝑥 to a ranked list. We will denote with the shorthand notation 𝜋 (𝑎|𝑥) the rank that item 𝑎 will
be placed at when 𝜋 is presented with context 𝑥 (assuming A𝑥 given). Note that this framing is easily extended to
more general stochastic ranking policies [26]. Then, a dataset D and position bias model b
P can be used to to obtain an
7
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen
Pempirical (𝑉 |𝑅)
b 0.5157 0.5999 0.5843 0.4813 0.3369
Plog (𝑉 |𝑅, 𝑋 )
b 0.4852 0.5620 0.5577 0.4806 0.3555
Pexp (𝑉 |𝑅, 𝑋 )
b 0.4883 0.5761 0.5778 0.4959 0.3652
Pprob (𝑉 |𝑅, 𝑋 )
b 0.4850 0.5620 0.5551 0.4651 0.3325
Table 1. NLL for position bias models on observed data, lower is better. The top-group are independent of contextual information, the
middle baseline is a non-parametric method that predicts a sample average, the bottom-group include three parameters that were
optimised via linear regression. Marked fields indicate stat. sig. improvements over other methods in the same group at a 99% level.
Maximum Likelihood Estimation, amenable to gradient descent, allows for flexible and efficient optimisation. Using real-
world data from a social media platform, we have empirically validated that our novel methods model empirical exposure
significantly more accurately than competing methods, and that the obtained estimates pave the way for improvements
in unbiased offline evaluation. In future work, we wish to extend our approach to incorporate cascade-based alternatives.
REFERENCES
[1] A. Agarwal, I. Zaitsev, X. Wang, C. Li, M. Najork, and T. Joachims. 2019. Estimating Position Bias without Intrusive Interventions. In Proc. of the
Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). ACM, 474–482. https://doi.org/10.1145/3289600.3291017
[2] Q. Ai, K. Bi, C. Luo, J. Guo, and W. B. Croft. 2018. Unbiased Learning to Rank with Unbiased Propensity Estimation. In The 41st International ACM
SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’18). ACM, 385–394. https://doi.org/10.1145/3209978.3209986
[3] J. Beel, M. Genzmehr, S. Langer, A. Nürnberger, and B. Gipp. 2013. A Comparative Analysis of Offline and Online Evaluations and Discussion of
Research Paper Recommender System Evaluation. In Proc. of the International Workshop on Reproducibility and Replication in Recommender Systems
Evaluation (RepSys ’13). 7–14.
[4] P. Castells and A. Moffat. 2022. Offline recommender system evaluation: Challenges and new directions. AI Magazine 43, 2 (2022), 225–238.
https://doi.org/10.1002/aaai.12051
[5] O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. of the 18th ACM Conference on
Information and Knowledge Management (CIKM ’09). ACM, 621–630. https://doi.org/10.1145/1645953.1646033
[6] O. Chapelle and Y. Zhang. 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking. In Proc. of the 18th International Conference on
World Wide Web (WWW ’09). ACM, 1–10. https://doi.org/10.1145/1526709.1526711
[7] J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He. 2022. Bias and Debias in Recommender System: A Survey and Future Directions. ACM
Trans. Inf. Syst. (oct 2022). https://doi.org/10.1145/3564284 Just Accepted.
[8] A. Chuklin, I. Markov, and M. de Rijke. 2015. Click Models for Web Search. Morgan & Claypool. https://doi.org/10.2200/S00654ED1V01Y201507ICR043
[9] N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. 2008. An Experimental Comparison of Click Position-Bias Models. In Proc. of the 2008 International
Conference on Web Search and Data Mining (WSDM ’08). ACM, 87–94. https://doi.org/10.1145/1341531.1341545
[10] F. Diaz, B. Mitra, M. D. Ekstrand, A. J. Biega, and B. Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. In Proc. of the 29th
ACM International Conference on Information & Knowledge Management (CIKM ’20). ACM, 275–284. https://doi.org/10.1145/3340531.3411962
[11] Z. Fang, A. Agarwal, and T. Joachims. 2019. Intervention Harvesting for Context-Dependent Examination-Bias Estimation. In Proc. of the 42nd
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, 825–834. https://doi.org/10.1145/
3331184.3331238
[12] F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. 2014. Offline and Online Evaluation of News Recommender Systems at
Swissinfo.Ch. In Proc. of the 8th ACM Conference on Recommender Systems (RecSys ’14). 169–176. https://doi.org/10.1145/2645710.2645745
[13] A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B Testing for Recommender Systems. In Proc. of the Eleventh ACM
International Conference on Web Search and Data Mining (WSDM ’18). ACM, 198–206. https://doi.org/10.1145/3159652.3159687
[14] K. Hofmann, A. Schuth, A. Bellogín, and M. de Rijke. 2014. Effects of Position Bias on Click-Based Recommender Evaluation. In Advances in
Information Retrieval. Springer International Publishing, Cham, 624–630.
[15] K. Järvelin and J. Kekäläinen. 2002. Cumulated Gain-Based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (oct 2002), 422–446. https:
//doi.org/10.1145/582415.582418
[16] O. Jeunen. 2019. Revisiting Offline Evaluation for Implicit-feedback Recommender Systems. In Proc. of the 13th ACM Conference on Recommender
Systems (RecSys ’19). ACM, 596–600. https://doi.org/10.1145/3298689.3347069
[17] O. Jeunen. 2021. Offline Approaches to Recommendation with Online Success. Ph. D. Dissertation. University of Antwerp.
[18] O. Jeunen and B. Goethals. 2021. Top-K Contextual Bandits with Equity of Exposure. In Proc. of the 15th ACM Conference on Recommender Systems
(RecSys ’21). ACM, 310–320. https://doi.org/10.1145/3460231.3474248
[19] O. Jeunen, K. Verstrepen, and B. Goethals. 2018. Fair Offline Evaluation Methodologies for Implicit-feedback Recommender Systems with MNAR
Data. In Proc. of the REVEAL 18 Workshop on Offline Evaluation for Recommender Systems (RecSys ’18).
[20] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. 2005. Accurately Interpreting Clickthrough Data As Implicit Feedback. In Proc. of
the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’05). ACM, 154–161. https:
//doi.org/10.1145/1076034.1076063
[21] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. 2007. Evaluating the Accuracy of Implicit Feedback from Clicks and Query
Reformulations in Web Search. ACM Trans. Inf. Syst. 25, 2 (apr 2007), 7–es. https://doi.org/10.1145/1229179.1229181
[22] T. Joachims, A. Swaminathan, and T. Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the Tenth ACM International
Conference on Web Search and Data Mining (WSDM ’17). ACM, 781–789. https://doi.org/10.1145/3018661.3018699
[23] N. L. Johnson, A. W. Kemp, and S. Kotz. 2005. Univariate discrete distributions. Vol. 444. John Wiley & Sons.
[24] M. J. Mei, C. Zuber, and Y. Khazaeni. 2022. A Lightweight Transformer for Next-Item Product Recommendation. In Proc. of the 16th ACM Conference
on Recommender Systems (RecSys ’22). ACM, 546–549. https://doi.org/10.1145/3523227.3547491
9
RecSys ’23, September 18–22, 2023, Singapore, Singapore Olivier Jeunen
[25] A. Moffat and J. Zobel. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (dec 2008),
27 pages. https://doi.org/10.1145/1416950.1416952
[26] H. Oosterhuis. 2021. Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness. In Proc. of the 44th
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’21). ACM, 1023–1032. https://doi.org/10.1145/
3404835.3462830
[27] H. Oosterhuis. 2023. Doubly Robust Estimation for Correcting Position Bias in Click Feedback for Unbiased Learning to Rank. ACM Trans. Inf. Syst.
41, 3, Article 61 (feb 2023), 33 pages. https://doi.org/10.1145/3569453
[28] H. Oosterhuis and M. de Rijke. 2020. Policy-Aware Unbiased Learning to Rank for Top-k Rankings. In Proc. of the 43rd International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR ’20). ACM, 489–498. https://doi.org/10.1145/3397271.3401102
[29] Z. Ovaisi, R. Ahsan, Y. Zhang, K. Vasilaky, and E. Zheleva. 2020. Correcting for Selection Bias in Learning-to-Rank Systems. In Proc. of The Web
Conference 2020 (WWW ’20). ACM, 1863–1873. https://doi.org/10.1145/3366423.3380255
[30] A. B. Owen. 2013. Monte Carlo theory, methods and examples.
[31] J. Pearl. 2009. Causality. Cambridge university press.
[32] M. Rossetti, F. Stella, and M. Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proc. of the
10th ACM Conference on Recommender Systems (RecSys ’16). ACM, 31–34. https://doi.org/10.1145/2959100.2959176
[33] M. Ruffini, V. Bellini, A. Buchholz, G. Di Benedetto, and Y. Stein. 2022. Modeling Position Bias Ranking for Streaming Media Services. In Companion
Proc. of the Web Conference 2022 (WWW ’22). ACM, 72–76. https://doi.org/10.1145/3487553.3524210
[34] H. A. Simon. 1955. On a Class of Skew Distribution Functions. Biometrika 42, 3-4 (12 1955), 425–440. https://doi.org/10.1093/biomet/42.3-4.425
[35] A. Swaminathan and T. Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing
Systems. 3231–3239. https://proceedings.neurips.cc/paper/2015/file/39027dfad5138c9ca0c474d71db915c3-Paper.pdf
[36] D. Valcarce, A. Bellogín, J. Parapar, and P. Castells. 2020. Assessing ranking metrics in top-N recommendation. Information Retrieval Journal 23, 4
(01 Aug 2020), 411–448. https://doi.org/10.1007/s10791-020-09377-x
[37] A. Vardasbi, H. Oosterhuis, and M. de Rijke. 2020. When Inverse Propensity Scoring Does Not Work: Affine Corrections for Unbiased Learning
to Rank. In Proc. of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). ACM, 1475–1484. https:
//doi.org/10.1145/3340531.3412031
[38] P. Virtanen, R. Gommers, T. E. Oliphant, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 17, 3
(2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
[39] E. M. Voorhees et al. 1999. The TREC-8 question answering track report. In TREC, Vol. 99. 77–82.
[40] X. Wang, N. Golbandi, M. Bendersky, D. Metzler, and M. Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search.
In Proc. of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM ’18). ACM, 610–618. https://doi.org/10.1145/3159652.
3159732
[41] X. Wu, H. Chen, J. Zhao, L. He, D. Yin, and Y. Chang. 2021. Unbiased Learning to Rank in Feeds Recommendation. In Proc. of the 14th ACM
International Conference on Web Search and Data Mining (WSDM ’21). ACM, 490–498. https://doi.org/10.1145/3437963.3441751
[42] G. U. Yule. 1925. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F. R. S. Philosophical Transactions of the Royal
Society of London. Series B, Containing Papers of a Biological Character 213, 402-410 (1925), 21–87. https://doi.org/10.1098/rstb.1925.0002
[43] F. Zhang, Y. Liu, J. Mao, M. Zhang, and S. Ma. 2020. User behavior modeling for Web search evaluation. AI Open 1 (2020), 40–56. https:
//doi.org/10.1016/j.aiopen.2021.02.003
10