Merge two dataframes on multiple columns, only if not NaN
Given two Pandas dataframes, how can I use the second dataframe to fill in missing values, given multiple key columns?
Col1 Col2 Key1 Key2 Extra1 Col1 Col2 Key1 Key2 Col1 Col2 Key1 Key2 Extra1
-------------------------------------------------------------------------------------------------------------------
[A, B, 1.10, 1.11, Alice] [A, B, 1.10, 1.11, Alice] # left df has more non-NaNs, so leave it
[C, D, 2.10, 2.11, Bob] [np.nan, np.nan, 1.10, 1.11] [C, D, 2.10, 2.11, Bob] # unmatched row should still exist
[np.nan, np.nan, 3.10, 3.11, Charlie] + [E, F, 3.10, 3.11] = [E, F, 3.10, 3.11, Charlie] # left df has NaN, so fill in values
[I, np.nan, 5.10, 5.11, Destiny] [G, H, 4.10, 4.11] [I, np.nan, 5.10, 5.11, Destiny] # no matching values in second df, so leave it
[np.nan, J, 6.10, 6.11, Evan] [np.nan, J, 6.10, 6.11, Evan] # no matching values in second df, so leave it
My attempt:
import pandas as pd
import numpy as np
data = [
[A, B, 1.10, 1.11, Alice],
[C, D, 2.10, 2.11, Bob],
[np.nan, np.nan, 3.10, 3.11, Charlie],
[I, np.nan, 5.10, 5.11, Destiny],
[np.nan, J, 6.10, 6.11, Evan],
]
df1 = pd.DataFrame(data, columns = ['common_1', 'common_2', 'common_3', 'common_4', 'extra_1'])
data = [
[np.nan, np.nan, 1.10, 1.11],
[E, F, 3.10, 3.11],
[G, H, 4.10, 4.11],
]
df2 = pd.DataFrame(data, columns = ['common_1_other', 'common_2_other', 'common_3_other', 'common_4_other'])
df3 = pd.merge(
df1,
df2,
how=left,
left_on=[common_3, common_4],
right_on=[common_3_other, common_4_other],
)
Want df3 to be:
[A, B, 1.10, 1.11, Alice]
[C, D, 2.10, 2.11, Bob]
[E, F, 3.10, 3.11, Charlie]
[I, np.nan, 5.10, 5.11, Destiny]
[np.nan, J, 6.10, 6.11, Evan]
Topic data-imputation pandas python
Category Data Science