How to use the fillna method in a for loop

I am working on a housing dataset. In a list of columns (Garage, Fireplace, etc), I have values called NA which just means that the particular house in question does not have that feature (Garage, Fireplace). It doesn't mean that the value is missing/unknown. However, Python interprets this as NaN, which is wrong. To come across this, I want to replace this value NA with XX to help Python distinguish it from NaN values. Because there is a whole list of them, I want use a for loop to accomplish this in a few lines of code:

na_data = ['Alley', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'FireplaceQu', 'GarageType',
           'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQC', 'Fence', 'MiscFeature']

for i in range(len(na_data)):
    train[i] = train[i].fillna('XX')

I know this isn't the correct way of doing it as it is giving me a KeyError: 0. This is kinda like a pseudocode way of doing it to visualize what I'm trying to accomplish. What is the way to automate fillna('XX') on this list of columns?

Topic feature-engineering kaggle pandas python machine-learning

Category Data Science


Mine works this way:

# data_train.columns
na_data = ['Gender', 'Married', 'Dependents','Self_Employed','Credit_History','Loan_Amount_Term']

for i in na_data:
    data_train[i].fillna(data_train[i].mode()[0], inplace=True)
    print(i)

These result are from print(i), just for confirmation.

#Gender
#Married
#Dependents
#Self_Employed
#Credit_History
#Loan_Amount_Term

While replace is a valid approach, it can be inefficient and slow on a large scale - see this question.

You should instead use map to encode NA as XX - perhaps something like this:

na_data = ['Alley', ...,'Fence', 'MiscFeature']
for col in na_data:
   train[col]= train[col].map({'NA':'XX'})

what you are looking for is replace().

And you don't need to write all the columns you can iterate over the columns name simply.

for col in train:
    train[col].replace("NA","XX",inplace=True)

You can do it on all the dataset in one line:

train.replace("NA","XX", inplace=True)

Or on specific columns:

for cols in na_data:
    train[col].replace("NA","XX",inplace=True)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.