Next Article in Journal
RGen: Data Generator for Benchmarking Big Data Workloads
Previous Article in Journal
Detection of DoS Attacks in an IoT Environment with MQTT Protocol Based on Intelligent Binary Classifiers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available †

by
Wende Clarence Safari
1,2,*,
Ignacio López-de-Ullibarri
2 and
María Amalia Jácome
1,2
1
Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Universidade da Coruña, 15071 A Coruña, Spain
2
MODES Group, Department of Mathematics, Universidade da Coruña, 15071 A Coruña, Spain
*
Author to whom correspondence should be addressed.
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.
Eng. Proc. 2021, 7(1), 17; https://doi.org/10.3390/engproc2021007017
Published: 9 October 2021
(This article belongs to the Proceedings of The 4th XoveTIC Conference)

Abstract

:
We introduce nonparametric estimators to estimate the conditional survival function, cure probability and latency function in the setting of a mixture cure model when the cure status is partially known. For the sake of illustration, we present an application concerning patients hospitalized with COVID-19 in Galicia (Spain) during the first outbreak of the epidemic.

1. Introduction

Survival analysis arises in many applications where we want to reason about the amount of time until the considered event happens. A common assumption in standard survival modeling is that all individuals can experience the event if observed for a sufficient amount of time. Cure models [1] have been developed because there might be situations where the standard survival model is not true, for example, in the event of a recurrence in some diseases or death from some types of cancer. One challenge with time-to-event data is that the event is not always observed (censored observations). Standard cure models typically make inferences based on the assumption that the cure status information is an unobserved (latent) variable as the event is only known for the uncensored (uncured) subjects, but it is unknown for the censored observations whether it is cured or not. There are situations where cure status information is known for some of the censored individuals as they can be identified to be insusceptible to the considered event, that is, known to be cured. For example, when a medical test ascertains that a disease has entirely disappeared after treatment.
In this paper, we present kernel methods to estimate the conditional survival function, cure probability and latency function in the presence of cure status information. The proposed approach contributes to state-of-the-art in time-to-event data, as it extends previous works in the mixture cure model.

2. Estimation When the Cure Status Is Partially Available

Let Y be the time until the event of interest, X is a vector of covariates and F ( t x ) = P ( Y t X = x ) is the distribution function of Y conditional on X = x . In follow-up studies, the event of interest may not be observed due to, for example, the end of the study or loss to follow up, which occurs at censoring time C * with conditional distribution function G ( t x ) = P ( C * t X = x ) . As a consequence, instead of observing Y, only the possibly censored survival time T * = min Y , C * and the indicator of the event δ = 1 ( Y < C * ) can be observed. The random variables Y and C * are assumed to be conditionally independent given X = x , which is a widely used assumption in most studies. We set Y = if the subject will not experience the event and so is cured. Let ν = 1 ( Y = ) be the indicator of being cured. Note that ν is partially observed because the individual is known not to be cured ( ν = 0 ) when the event is observed ( δ = 1 ), but in the general situation, ν is unknown when δ = 0 . When the cure status is partially known, some censored individuals are identified to be cured, so ν = 1 is observed.
To accommodate the cure status information, we include an additional random variable ξ , which indicates whether the cure status ν is known ( ξ = 1 ) or not ( ξ = 0 ). Furthermore, let the censoring distribution be an improper distribution function G ( t x ) = 1 π x G 0 ( t x ) . Thus, with probability π x , the censoring variable is C * = , and with probability 1 π x the value of the censoring variable C * corresponds to the value of a random variable C with proper continuous distribution function G 0 ( t x ) . A cured individual is identified with probability P ξ = 1 ν = 1 , X = x = P C * = X = x = π x . In this setup, the data actually observed are { ( X i , T i , δ i , ξ i , ξ i ν i ) : i = 1 , , n } , where the observed time is T i = min ( Y i , C i * ) = T i * , except for those identified as cured which is T i = C i . Hence, the observations { ( X i , T i , δ i , ξ i , ξ i ν i ) : i = 1 , , n } can be classified into three groups: (a) the individual is observed to have experienced the event and, therefore, is known to be uncured X i , T i = Y i , δ i = 1 , ξ i = 1 , ξ i ν i = 0 ; (b) the lifetime is censored and the cure status is unknown X i , T i = C i , δ i = 0 , ξ i = 0 , ξ i ν i = 0 ; and (c) the lifetime is censored and the individual is known to be cured X i , T i = C i , δ i = 0 , ξ i = 1 , ξ i ν i = 1 . In standard cure models where the cure status is unknown for all the censored observations, only groups (a) and (b) are considered.
The probability of cure is 1 p ( x ) = P ( Y = X = x ) , and the conditional survival function of the uncured individuals, also known as latency, is S 0 ( t x ) = P ( Y > t Y < , X = x ) . The mixture cure model specifies the survival function S t x = P ( Y > t X = x ) as the following.
S t x = 1 p ( x ) + p ( x ) S 0 t x .
Assuming model (1) and the availability of a suitable estimator of the S ( t x ) , estimators of the cure probability and the latency can be derived by considering the following relationships.
1 p ( x ) = lim t S ( t x ) > 0 , S 0 ( t x ) = S ( t x ) { 1 p ( x ) } p ( x ) .
Safari et al. [2] proposed the generalized product-limit estimator of the conditional survival function S t x when the cure status is partially known, which is the following:
S ^ h c t x = i = 1 n 1 δ [ i ] B h [ i ] x 1 T i t j = i n B h [ j ] x + j = 1 i 1 B h [ j ] x 1 ξ [ j ] ν [ j ] = 1 ,
where X i , δ i , ξ i , and ν i are the concomitants of the ordered observed times T 1 T n , and B h [ i ] x is the Nadaraya–Watson (NW) weight of the following:
B h [ i ] x = K h x X [ i ] j = 1 n K h x X j ,
K h ( · ) = K ( · / h ) / h is a kernel function K ( · ) rescaled with bandwidth h. The corresponding estimator of the cure rate 1 p ( x ) [3] is the following:
1 p ^ h c x = S ^ h c T ( n ) 1 x ,
where T ( n ) 1 is the largest uncensored observed time. Here, in light of (3), (4), and the relation in (2), a nonparametric estimator of the latency function is given by the following.
S ^ 0 , h 1 , h 2 c t x = S ^ h 2 c t x ( 1 p ^ h 1 c ( x ) ) p ^ h 1 c ( x ) if 0 t T ( n ) 1 and S ^ h 2 c t x > 1 p ^ h 1 c ( x ) 0 otherwise .
The optimal bandwidth for S ^ h c t x in (3) is not necessarily the optimal bandwidth for 1 p ^ h c x in (4); therefore, the estimator in (5) is a more general estimator that uses two different bandwidths for estimating S t x and 1 p ( x ) . Note that if h = h 1 = h 2 , then the estimator in (5) reduces to the following estimator.
S ^ 0 , h c ( t x ) = S ^ h c t x ( 1 p ^ h c ( x ) ) p ^ h c ( x ) .

3. Application to COVID-19 Data

For illustration of the nonparametric estimators stated in Section 2, we present an application concerning patients hospitalized with COVID-19 in Galicia (Spain) during the first outbreak of the epidemic. We have a medical database of 10,454 COVID-19 patients reported by the Galician Healthcare Service between 6 March and 7 May 2020. This database contains some information on sex, age, and the dates of different medical outcomes such as admission to the intensive care unit (ICU), discharge, or death. The aim was to estimate the time from hospital ward until admission to ICU while adjusting for age and sex. In our analysis we included only 2380 patients who had been hospitalized for at least a day. Among them, 8.3 % were admitted to ICU and 91.7 % were censored. In the censored group, 68.8 % patients were discharged from the hospital alive and without the need for ICU, and 13.8 % died without entering the ICU. Therefore, these patients were identified to be “cured” from the event of interest, which is admission to ICU. Note that in this example, “being cured” means being free of experiencing admission to ICU and not being cured in medical terms.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work has been supported by MINECO grant MTM2017-82724-R and the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-014) and we wish to acknowledge the support received from the Centro de Investigación de Galicia “CITIC” funded by Xunta de Galicia and the European Union (European Regional Development Fund Galicia 2014–2020 Program) by grant ED431G 2019/01. The authors are grateful to Andrés Paz-Ares Rodríguez (General Director of Public Health), Xurxo Hervada Vidal (General Deputy Director of Information on Health and Epidemiology), and Benigno Rosón Calvo (general deputy director of the SERGAS information system) for providing the COVID-19 data.

References

  1. Peng, Y.; Yu, B. Cure Models: Methods, Applications, and Implementation; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
  2. Safari, W.C.; López-de-Ullibarri, I.; Jácome, M.A. A product-limit estimator of the conditional survival function when cure status is partially known. Biometr. J. 2021, 63, 984–1005. [Google Scholar] [CrossRef] [PubMed]
  3. Safari, W.C.; López-de-Ullibarri, I.; Jácome, M.A. Nonparametric Kernel Estimation of the Probability of Cure in a Mixture Cure Model When the Cure Status Is Partially Observed. Submitted. 2021. Available online: https://dm.udc.es/preprint/main_paper_cure_rate_Safari_et_al.pdf (accessed on 29 September 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Safari, W.C.; López-de-Ullibarri, I.; Jácome, M.A. Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available. Eng. Proc. 2021, 7, 17. https://doi.org/10.3390/engproc2021007017

AMA Style

Safari WC, López-de-Ullibarri I, Jácome MA. Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available. Engineering Proceedings. 2021; 7(1):17. https://doi.org/10.3390/engproc2021007017

Chicago/Turabian Style

Safari, Wende Clarence, Ignacio López-de-Ullibarri, and María Amalia Jácome. 2021. "Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available" Engineering Proceedings 7, no. 1: 17. https://doi.org/10.3390/engproc2021007017

Article Metrics

Back to TopTop