Memory issues for AalenAdditiveFitter in Lifelines packages in Python

We are working on a problem related to survival analysis. We have already implemented Cox Proportional-Hazard Model and Accelerated Failure Time algorithm. Now we want to see how the covariates change over time. So we decided to implement AalenAdditiveFitter from the lifelines library. Here is a dummy data presented. Data shape is (1341799, 4).

           Gender        Disability_level       Time_to_event     Event
    
    1      Female             Mild                   50            0 
    2       Male            Moderate                 70            1
    3       Male             Severe
    .
    .
    .
 1341799   Female             Mild                   45            1

Now, the problem we are facing is related to memory. After one hot encoding data shape becomes (1341799, 15). As far our understanding, AalenAdditiveFitter transposes the given data matrix and does some internal modifications. Number of columns increases to 1904 from only 15. Here is the error we are getting.

MemoryError: Unable to allocate 19.0 GiB for an array with shape (1904, 1341799) and data type float64

The code works fine when we reduce the number of rows. But that does not serve our purpose. Can anyone explain what is going on under the hood? Is there any work around available to this problem? Is there any other method available to capture the variability of the covariates throughout the time?

Topic survival-analysis python machine-learning

Category Data Science


Reading the code for AalenAdditiveFitter, there are a couple reasons for large memory use.

The primary issue is that it uses panda DataFrames which are memory inefficient (especially compared to NumPy arrays).

Another issue is the .fit() method makes copies of those large, inefficient panda DataFrames:

X, T, E, weights = self._preprocess_dataframe(df)

self.durations = T.copy()
self.event_observed = E.copy()
self.weights = weights.copy()

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.