Python and Titanic competition how to get the median of specific range of values where class is 3

Question

Python and Titanic competition how to get the median of specific range of values where class is 3

alim1990

2020年2月15日 13:18

I am trying to solve Kaggle's titanic competition. In the test set, there is only 1 row having Fare value as null. It's so easy to replace it with median or average of all Fare values.

But I am thinking of getting a graph showing relation between fares and classes (1, 2, 3) and fares with Embark field, to check some patterns and to narrow down the range of fare values before doing the calculations to replace null values.

The Embark field having 3 main values: S, Q and C.

The class field is having: 1, 2 and 3 as classes.

Fares are changing based on the class of course.

What I have done so far is:

fig, axs = plt.subplots(ncols=2, nrows=2, figsize=(20,20))
sns.scatterplot(test['Embarked'], y='Fare' , data=test,  ax=axs[0][0])
sns.scatterplot(test['Age'], y='Fare', data=test,  ax=axs[0][1])
sns.scatterplot(test['Pclass'], y='Fare', data=test,  ax=axs[1][0])
sns.scatterplot(test['Sex'], y='Fare', data=test,  ax=axs[1][1])

And the result was:

There is no patterns at the Embarked, Age and Sex plots.

But at the Pclass plot, we can clearly see that there is stricly defined range for class 3.

I tried to check it at the train set:

So the are nearly the same distribution.

How can fill the null value with median of values range of people having Pclass as 3?

the regular expression would be:

test['Fare'] = test['Fare'].fillna(test['Fare'].median())

But I really need to specify the range of values where Pclass=3

Topic data-analysis competitions kaggle classification python

Category Data Science

Oxbowerce · Accepted Answer · 2020年2月15日 13:18

If I understand the problem correctly, you want to fill all missing values in the Fare column by the median value of the Fare column where Pclass=3. This can be achieved by putting the extra row filter test['Pclass']=3 on median of the fare column, see below.

test['Fare'] = test['Fare'].fillna(test.loc[test['Pclass']=3,'Fare'].median())

Python and Titanic competition how to get the median of specific range of values where class is 3

About