Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available

Safari, Wende Clarence; López-de-Ullibarri, Ignacio; Jácome, María Amalia

doi:10.3390/engproc2021007017

Open AccessProceeding Paper

Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available^†

by

Wende Clarence Safari

^1,2,*

,

Ignacio López-de-Ullibarri

²

and

María Amalia Jácome

^1,2

¹

Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Universidade da Coruña, 15071 A Coruña, Spain

²

MODES Group, Department of Mathematics, Universidade da Coruña, 15071 A Coruña, Spain

^*

Author to whom correspondence should be addressed.

^†

Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.

Eng. Proc. 2021, 7(1), 17; https://doi.org/10.3390/engproc2021007017

Published: 9 October 2021

(This article belongs to the Proceedings of The 4th XoveTIC Conference)

Download Versions Notes

Abstract

:

We introduce nonparametric estimators to estimate the conditional survival function, cure probability and latency function in the setting of a mixture cure model when the cure status is partially known. For the sake of illustration, we present an application concerning patients hospitalized with COVID-19 in Galicia (Spain) during the first outbreak of the epidemic.

Keywords:

COVID-19; ICU; kernel estimators; mixture cure model; survival analysis

1. Introduction

Survival analysis arises in many applications where we want to reason about the amount of time until the considered event happens. A common assumption in standard survival modeling is that all individuals can experience the event if observed for a sufficient amount of time. Cure models [1] have been developed because there might be situations where the standard survival model is not true, for example, in the event of a recurrence in some diseases or death from some types of cancer. One challenge with time-to-event data is that the event is not always observed (censored observations). Standard cure models typically make inferences based on the assumption that the cure status information is an unobserved (latent) variable as the event is only known for the uncensored (uncured) subjects, but it is unknown for the censored observations whether it is cured or not. There are situations where cure status information is known for some of the censored individuals as they can be identified to be insusceptible to the considered event, that is, known to be cured. For example, when a medical test ascertains that a disease has entirely disappeared after treatment.

In this paper, we present kernel methods to estimate the conditional survival function, cure probability and latency function in the presence of cure status information. The proposed approach contributes to state-of-the-art in time-to-event data, as it extends previous works in the mixture cure model.

2. Estimation When the Cure Status Is Partially Available

Let Y be the time until the event of interest, X is a vector of covariates and

F (t ∣ x) = P (Y \leq t ∣ X = x)

is the distribution function of Y conditional on

X = x

. In follow-up studies, the event of interest may not be observed due to, for example, the end of the study or loss to follow up, which occurs at censoring time

C^{*}

with conditional distribution function

G (t ∣ x) = P (C^{*} \leq t ∣ X = x)

. As a consequence, instead of observing Y, only the possibly censored survival time

T^{*} = min (Y, C^{*})

and the indicator of the event

δ = 1 (Y < C^{*})

can be observed. The random variables Y and

C^{*}

are assumed to be conditionally independent given

X = x

, which is a widely used assumption in most studies. We set

Y = \infty

if the subject will not experience the event and so is cured. Let

ν = 1 (Y = \infty)

be the indicator of being cured. Note that

ν

is partially observed because the individual is known not to be cured (

ν = 0

) when the event is observed (

δ = 1

), but in the general situation,

ν

is unknown when

δ = 0

. When the cure status is partially known, some censored individuals are identified to be cured, so

ν = 1

is observed.

To accommodate the cure status information, we include an additional random variable

ξ

, which indicates whether the cure status

ν

is known (

ξ = 1

) or not (

ξ = 0

). Furthermore, let the censoring distribution be an improper distribution function

G (t ∣ x) = (1 - π (x)) G_{0} (t ∣ x)

. Thus, with probability

π (x)

, the censoring variable is

C^{*} = \infty

, and with probability

1 - π (x)

the value of the censoring variable

C^{*}

corresponds to the value of a random variable C with proper continuous distribution function

G_{0} (t ∣ x)

. A cured individual is identified with probability

P (ξ = 1 ∣ ν = 1, X = x) = P (C^{*} = \infty ∣ X = x) = π (x)

. In this setup, the data actually observed are

{(X_{i}, T_{i}, δ_{i}, ξ_{i}, ξ_{i} ν_{i}) : i = 1, \dots, n}

, where the observed time is

T_{i} = min (Y_{i}, C_{i}^{*}) = T_{i}^{*}

, except for those identified as cured which is

T_{i} = C_{i}

. Hence, the observations

{(X_{i}, T_{i}, δ_{i}, ξ_{i}, ξ_{i} ν_{i}) : i = 1, \dots, n}

can be classified into three groups: (a) the individual is observed to have experienced the event and, therefore, is known to be uncured

(X_{i}, T_{i} = Y_{i}, δ_{i} = 1, ξ_{i} = 1, ξ_{i} ν_{i} = 0)

; (b) the lifetime is censored and the cure status is unknown

(X_{i}, T_{i} = C_{i}, δ_{i} = 0, ξ_{i} = 0, ξ_{i} ν_{i} = 0)

; and (c) the lifetime is censored and the individual is known to be cured

(X_{i}, T_{i} = C_{i}, δ_{i} = 0, ξ_{i} = 1, ξ_{i} ν_{i} = 1)

. In standard cure models where the cure status is unknown for all the censored observations, only groups (a) and (b) are considered.

The probability of cure is

1 - p (x) = P (Y = \infty ∣ X = x)

, and the conditional survival function of the uncured individuals, also known as latency, is

S_{0} (t ∣ x) = P (Y > t ∣ Y < \infty, X = x)

. The mixture cure model specifies the survival function

S (t ∣ x) = P (Y > t ∣ X = x)

as the following.

\begin{matrix} S (t ∣ x) = 1 - p (x) + p (x) S_{0} (t ∣ x) . \end{matrix}

(1)

Assuming model (1) and the availability of a suitable estimator of the

S (t ∣ x)

, estimators of the cure probability and the latency can be derived by considering the following relationships.

\begin{matrix} 1 - p (x) = lim_{t \to \infty} S (t ∣ x) > 0, S_{0} (t ∣ x) = \frac{S (t ∣ x) - {1 - p (x)}}{p (x)} . \end{matrix}

(2)

Safari et al. [2] proposed the generalized product-limit estimator of the conditional survival function

S (t ∣ x)

when the cure status is partially known, which is the following:

\begin{matrix} {\hat{S}}_{h}^{c} (t ∣ x) = \prod_{i = 1}^{n} (1 - \frac{δ_{[i]} B_{h [i]} (x) 1 (T_{(i)} \leq t)}{\sum_{j = i}^{n} B_{h [j]} (x) + \sum_{j = 1}^{i - 1} B_{h [j]} (x) 1 (ξ_{[j]} ν_{[j]} = 1)}), \end{matrix}

(3)

where

X_{[i]}

,

δ_{[i]}

,

ξ_{[i]}

, and

ν_{[i]}

are the concomitants of the ordered observed times

T_{(1)} \leq \dots \leq T_{(n)}

, and

B_{h [i]} (x)

is the Nadaraya–Watson (NW) weight of the following:

B_{h [i]} (x) = \frac{K_{h} (x - X_{[i]})}{\sum_{j = 1}^{n} K_{h} (x - X_{j})},

K_{h} (\cdot) = K (\cdot / h) / h

is a kernel function

K (\cdot)

rescaled with bandwidth h. The corresponding estimator of the cure rate

1 - p (x)

[3] is the following:

\begin{matrix} 1 - {\hat{p}}_{h}^{c} (x) = {\hat{S}}_{h}^{c} (T_{(n)}^{1} ∣ x), \end{matrix}

(4)

where

T_{(n)}^{1}

is the largest uncensored observed time. Here, in light of (3), (4), and the relation in (2), a nonparametric estimator of the latency function is given by the following.

\begin{matrix} {\hat{S}}_{0, h_{1}, h_{2}}^{c} (t ∣ x) = \{\begin{matrix} \frac{{\hat{S}}_{h_{2}}^{c} (t ∣ x) - (1 - {\hat{p}}_{h_{1}}^{c} (x))}{{\hat{p}}_{h_{1}}^{c} (x)} & if 0 \leq t \leq T_{(n)}^{1} and {\hat{S}}_{h_{2}}^{c} (t ∣ x) > 1 - {\hat{p}}_{h_{1}}^{c} (x) \\ 0 & otherwise . \end{matrix} \end{matrix}

(5)

The optimal bandwidth for

{\hat{S}}_{h}^{c} (t ∣ x)

in (3) is not necessarily the optimal bandwidth for

1 - {\hat{p}}_{h}^{c} (x)

in (4); therefore, the estimator in (5) is a more general estimator that uses two different bandwidths for estimating

S (t ∣ x)

and

1 - p (x)

. Note that if

h = h_{1} = h_{2}

, then the estimator in (5) reduces to the following estimator.

{\hat{S}}_{0, h}^{c} (t ∣ x) = \frac{{\hat{S}}_{h}^{c} (t ∣ x) - (1 - {\hat{p}}_{h}^{c} (x))}{{\hat{p}}_{h}^{c} (x)} .

3. Application to COVID-19 Data

For illustration of the nonparametric estimators stated in Section 2, we present an application concerning patients hospitalized with COVID-19 in Galicia (Spain) during the first outbreak of the epidemic. We have a medical database of

10,454

COVID-19 patients reported by the Galician Healthcare Service between 6 March and 7 May 2020. This database contains some information on sex, age, and the dates of different medical outcomes such as admission to the intensive care unit (ICU), discharge, or death. The aim was to estimate the time from hospital ward until admission to ICU while adjusting for age and sex. In our analysis we included only

2380

patients who had been hospitalized for at least a day. Among them,

8.3 %

were admitted to ICU and

91.7 %

were censored. In the censored group,

68.8 %

patients were discharged from the hospital alive and without the need for ICU, and

13.8 %

died without entering the ICU. Therefore, these patients were identified to be “cured” from the event of interest, which is admission to ICU. Note that in this example, “being cured” means being free of experiencing admission to ICU and not being cured in medical terms.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work has been supported by MINECO grant MTM2017-82724-R and the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-014) and we wish to acknowledge the support received from the Centro de Investigación de Galicia “CITIC” funded by Xunta de Galicia and the European Union (European Regional Development Fund Galicia 2014–2020 Program) by grant ED431G 2019/01. The authors are grateful to Andrés Paz-Ares Rodríguez (General Director of Public Health), Xurxo Hervada Vidal (General Deputy Director of Information on Health and Epidemiology), and Benigno Rosón Calvo (general deputy director of the SERGAS information system) for providing the COVID-19 data.

References

Peng, Y.; Yu, B. Cure Models: Methods, Applications, and Implementation; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
Safari, W.C.; López-de-Ullibarri, I.; Jácome, M.A. A product-limit estimator of the conditional survival function when cure status is partially known. Biometr. J. 2021, 63, 984–1005. [Google Scholar] [CrossRef] [PubMed]
Safari, W.C.; López-de-Ullibarri, I.; Jácome, M.A. Nonparametric Kernel Estimation of the Probability of Cure in a Mixture Cure Model When the Cure Status Is Partially Observed. Submitted. 2021. Available online: https://dm.udc.es/preprint/main_paper_cure_rate_Safari_et_al.pdf (accessed on 29 September 2021).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Safari, W.C.; López-de-Ullibarri, I.; Jácome, M.A. Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available. Eng. Proc. 2021, 7, 17. https://doi.org/10.3390/engproc2021007017

AMA Style

Safari WC, López-de-Ullibarri I, Jácome MA. Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available. Engineering Proceedings. 2021; 7(1):17. https://doi.org/10.3390/engproc2021007017

Chicago/Turabian Style

Safari, Wende Clarence, Ignacio López-de-Ullibarri, and María Amalia Jácome. 2021. "Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available" Engineering Proceedings 7, no. 1: 17. https://doi.org/10.3390/engproc2021007017

Article Menu

Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available^†

Abstract

1. Introduction

2. Estimation When the Cure Status Is Partially Available

3. Application to COVID-19 Data

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available †

Abstract

1. Introduction

2. Estimation When the Cure Status Is Partially Available

3. Application to COVID-19 Data

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available^†