Add ID information from one dataframe to every row in another dataframe without a common key

Question

Add ID information from one dataframe to every row in another dataframe without a common key

Colin C

2022年5月3日 23:05

I am trying to combine two dataframes into the second dataframe, but duplicating the first dataframe into every row of dataframe two.

The first dataframe (df1) contains the identifing information, and the other dataframe (df2) contains data about that identifying information. They are in different dataframes since they come from different files. I am sure there is an easy way to do this, but cant seem to find it. The basic problem is there are no unique identifiers on df2 yet, and I need them from df1.

For example, I get df1 from a file containing the identifing information (this dataframe will always only have 1 row, some chance that the column names change as well):

df1:


              UID    Name       ID2
0           21YX4    Alan  21458200

while df2 comes from another file, but contains no base identifying information, like this (this dataframe can have very many rows and columns, and columns vary by file). df2's keys in this case are c and d (the index is reset), but that is not the problem, the real ID is in df1:


df2
Out[1184]: 
           a        b       c   d    e
0   1.427962       NaN  shark   0  0.0
1  -0.977258  7.000000  shark   1  NaN
2  -0.301779       NaN  python  0  NaN
3   0.393833       NaN  panda   1  1.0
4   0.494462       7.0  panda   2  0.0
5  -0.526740  5.000000  shark   2  NaN

Desired result


      UID    Name       ID2           a         b       c   d    e
0   21YX4    Alan  21458200    1.427962        NaN  shark   0  0.0
1   21YX4    Alan  21458200   -0.977258   7.000000  shark   0  NaN
2   21YX4    Alan  21458200   -0.301779        NaN  python  1  NaN
3   21YX4    Alan  21458200    0.393833        NaN  panda   1  1.0
4   21YX4    Alan  21458200    0.494462        7.0  panda   2  0.0
5   21YX4    Alan  21458200   -0.526740   5.000000  shark   2  NaN

I have tried concat, join as well as looping through the keys, but no luck so far. A join like:

df2=df2.join(df, how='outer')

puts df1 into df2, but as a new row with no data, and no ID values for the data rows. Is reshape or repeat necessary somehow or is there an easy way to do this?

Any suggestions?

Topic dataframe pandas python

Category Data Science

Colin C · Accepted Answer · 2019年8月3日 03:59

Well, it turned out I accidentally answered my own question in the title. My answer to this is to insert a new common key into both dataframes, then delete the new key after the 2 are merged. Maybe there is a more elegant way, but this works.

df.insert(0,'mykey','abcdef')
df2.insert(0,'mykey','abcdef')
df2=pd.merge(df,df2,how='outer',on='mykey')
df2.drop(columns=['mykey'])

Where mykey is the new common key, but make sure it will not match any existing columns, while 'abcdef' is random, but needs to be the same added to df1 and df2.

Add ID information from one dataframe to every row in another dataframe without a common key

About