Kaplan-Meier and Nelson-Aalen Estimators

When the empirical data is incomplete (truncated or censored), raw empirical estimators will not produce good results.  In this scenario, there are two techniques available to determine the distribution function based on the data.  The Kaplan-Meier product limit estimator can be used to generate a survival distribution function.  The Nelson-Aalen estimator can be used to generate a cumulative hazard rate function.  The Kaplan-Meier estimator is given by:

S_n(t) = \displaystyle \prod_{i=1}^{j-1} \left(1-\frac{s_i}{r_i}\right), \quad y_{j-1} \le t < y_j

where r_i is the risk set at time y_i and s_i is the number of observations from the random event whose distribution you are trying to estimate.  For example, if the random event you are interested in is death, then r_1 could be the number of life insurance policy holders immediately prior to the first death, and s_i would be the number of observed deaths at the time of the first death (you can have simultaneous deaths).  The key to dealing with problems that use this estimator is to understand how r_i changes with respect to censoring or truncation.  If a person withdraws from the life insurance policy, this decreases r_i but this is not a death, so it does not contribute to s_i.  If new members join at time y_i, they are not part of the risk set until time y_{i+1}.

If the data is censored past a certain point, you can assume an exponential distribution for the censored portion.  Suppose observations past c are censored.  If you know the value of S_n(c) you can solve for \theta using S_n(c) = e^{-c/\theta}.

The Nelson-Aalen cumulative hazard rate estimator is given by:

\tilde H(t) = \displaystyle \sum_{i-1}^{j-1} \frac{s_i}{r_i}, \quad y_{j-1} \le t < y_j

You can use this to get a survival function:

\tilde S(t) = e^{-\tilde H(t)}

Leave a comment

Filed under Empirical Models

Leave a comment