-
Identification of distributions for risks based on the first moment and c-statistic
Authors:
Mohsen Sadatsafavi,
Tae Yoon Lee,
John Petkau
Abstract:
We show that for any family of distributions with support on [0,1] with strictly monotonic cumulative distribution function (CDF) that has no jumps and is quantile-identifiable (i.e., any two distinct quantiles identify the distribution), knowing the first moment and c-statistic is enough to identify the distribution. The derivations motivate numerical algorithms for mapping a given pair of expect…
▽ More
We show that for any family of distributions with support on [0,1] with strictly monotonic cumulative distribution function (CDF) that has no jumps and is quantile-identifiable (i.e., any two distinct quantiles identify the distribution), knowing the first moment and c-statistic is enough to identify the distribution. The derivations motivate numerical algorithms for mapping a given pair of expected value and c-statistic to the parameters of specified two-parameter distributions for probabilities. We implemented these algorithms in R and in a simulation study evaluated their numerical accuracy for common families of distributions for risks (beta, logit-normal, and probit-normal). An area of application for these developments is in risk prediction modeling (e.g., sample size calculations and Value of Information analysis), where one might need to estimate the parameters of the distribution of predicted risks from the reported summary statistics.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Non-parametric inference on calibration of predicted risks
Authors:
Mohsen Sadatsafavi,
John Petkau
Abstract:
Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypoth…
▽ More
Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified "bridge" test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack, where we provide suggestions on graphical presentation and the interpretation of results. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters. An accompanying R package implements this method (see https://github.com/resplab/cumulcalib/).
△ Less
Submitted 23 May, 2024; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Model-based ROC (mROC) curve: examining the effect of case-mix and model calibration on the ROC plot
Authors:
Mohsen Sadatsafavi,
Paramita Saha-Chaudhuri,
John Petkau
Abstract:
The performance of risk prediction models is often characterized in terms of discrimination and calibration. The Receiver Operating Characteristic (ROC) curve is widely used for evaluating model discrimination. When evaluating the performance of a risk prediction model in a new sample, the shape of the ROC curve is affected by both case-mix and the postulated model. Further, compared to discrimina…
▽ More
The performance of risk prediction models is often characterized in terms of discrimination and calibration. The Receiver Operating Characteristic (ROC) curve is widely used for evaluating model discrimination. When evaluating the performance of a risk prediction model in a new sample, the shape of the ROC curve is affected by both case-mix and the postulated model. Further, compared to discrimination, evaluating calibration has not received the same level of attention. Commonly used methods for model calibration involve subjective specification of smoothing or grouping. Leveraging the familiar ROC framework, we introduce the model-based ROC (mROC) curve to assess the calibration of a pre-specified model in a new sample. mROC curve is the ROC curve that should be observed if a pre-specified model is calibrated in the sample. We show the empirical ROC and mROC curves for a sample converge asymptotically if the model is calibrated in that sample. As a consequence, the mROC curve can be used to assess visually the effect of case-mix and model mis-calibration. Further, we propose a novel statistical test for calibration that does not require any smoothing or grouping. Simulations support the adequacy of the test. A case study puts these developments in a practical context. We conclude that mROC can easily be constructed and used to evaluate the effect of case-mix and model calibration on the ROC plot, thus adding to the utility of ROC curve analysis in the evaluation of risk prediction models. R code for the proposed methodology is provided (https://github.com/msadatsafavi/mROC/).
△ Less
Submitted 12 July, 2021; v1 submitted 29 February, 2020;
originally announced March 2020.