How to find median/average values between data frames with slightly different columns?

I am trying to combat run-to-run variance of the data I collect by combining the data from different runs and finding the mean/average. The problem is that in each run there is a chance that some of the features may not appear:

   x    y    z
0  0    2    2 
1  0    1    3 
2  5    3    0
3  1    1    0
4  0    2    0

   x    y    d
0  1    0    2 
1  1    1    3 
2  0    4    2  
3  0    2    0
4  0    2    1  

   z    y    
0  0    2
1  0    1 
2  0    2 
3  1    0
4  3    0

As you can see from this example, the rows are always consistent, but some runs might provide less columns than the rest. Therefore in a theoretical dataframe where all the columns are averaged, in some columns the values would have to be divided by a lower number than others (in this case the values in the y column will have to be divided by 3, but in the x column - by 2).

Bonus question: Is there a way make this row-specific: do the same thing, but not take into account the 0s, since in my case that indicates no data, so it might interfere with the results (y for row 0 has one zero, so the average will be $(2+2)\over 2$, whereas in row 1 it would be $(1+1+1)\over3$.

Topic groupby data-analysis dataframe pandas data-cleaning

Category Data Science


Assuming that you have the data stored in separate dataframes, you can use a combination of pandas.concat and pandas.DataFrame.groupby to achieve what you are looking for:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    "x": [0,0,5,1,0],
    "y": [2,1,3,1,2],
    "z": [2,3,0,0,0]
})

df2 = pd.DataFrame({
    "x": [1,1,0,0,0],
    "y": [0,1,4,2,2],
    "z": [2,3,2,0,1]
})

df3 = pd.DataFrame({
    "y": [2,1,2,0,0],
    "z": [0,0,0,1,3]
})

df = (
    # combine dataframes into a single dataframe
    pd.concat([df1, df2, df3])
    # replace 0 values with nan to exclude them from mean calculation
    .replace(0, np.nan)
    .reset_index()
    # group by the row within the original dataframe
    .groupby("index")
    # calculate the mean
    .mean()
)
index x y z
0 1 2 2
1 1 1 3
2 5 3 2
3 1 1.5 1
4 nan 2 2

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.