Multiple Merges make the data frame in pandas to explode and causing Memory Issue in jupyter notebook
I have made a multiple merges using pandas data frame (refer the example script below).
It made the data frame to explode and consume more memory as it records reach to 18 Billion in df3
and try to merge with 5Lack records in df4
.
This causing the memory issue. It consumes the whole memory in RAM(140 GB of memory) and session got killed.
df = df1[df1_columns].\
merge(
df2[df2_columns],
how='left',
left_on='col1',
right_on='col2'
).\
merge(df3[df3_columns], how='left', on='ID').\
merge(df4[df4_columns], how='left', on='ID')
)
Appreciate if have any solutions to tackle this joins causing an issue.
Topic jupyter azure-ml pandas python
Category Data Science