How to find median/average values between data frames with slightly different columns?
I am trying to combat run-to-run variance of the data I collect by combining the data from different runs and finding the mean/average. The problem is that in each run there is a chance that some of the features may not appear:
x y z
0 0 2 2
1 0 1 3
2 5 3 0
3 1 1 0
4 0 2 0
x y d
0 1 0 2
1 1 1 3
2 0 4 2
3 0 2 0
4 0 2 1
z y
0 0 2
1 0 1
2 0 2
3 1 0
4 3 0
As you can see from this example, the rows are always consistent, but some runs might provide less columns than the rest. Therefore in a theoretical dataframe where all the columns are averaged, in some columns the values would have to be divided by a lower number than others (in this case the values in the y column will have to be divided by 3, but in the x column - by 2).
Bonus question: Is there a way make this row-specific: do the same thing, but not take into account the 0s, since in my case that indicates no data, so it might interfere with the results (y for row 0 has one zero, so the average will be $(2+2)\over 2$, whereas in row 1 it would be $(1+1+1)\over3$.
Topic groupby data-analysis dataframe pandas data-cleaning
Category Data Science