Training & Test feature shape is different from number of columns in dataset

Question

Training & Test feature shape is different from number of columns in dataset

Victor Melvin

2021年6月24日 20:18

I am making a Sequential Neural Network for regression with 3 dense layers which will be trained on a simple dataset. But before I even get to that part of the code to execute the model I am getting a different shape of my features than columns in dataset. Columns of the dataset includes:

one categorical Name column which is one-hot encoded

2)the other 20 columns are integers/floats

I have 21 features in my dataset. ValueError is telling me it is expecting 36 but there are only 21 When I check the shape using X.shape for my dataset it is telling me the shape is (98,36). My dataset has 98 rows x 21 columns. There are only 21 features in my dataset. How is it getting a shape of 36 ?

I am consequently receiving this error of course when I try to run my Keras model

Error ValueError: Input 0 of layer sequential_1 is incompatible with the layer: expected axis -1 of input shape to have value 21 but received input with shape (None, 36)

Here is my code when I import and clean the dataset

Import Dataset

N_df_1 = pd.read_csv('/', error_bad_lines=False) #I can't show dataset paths 
N_df_2 = pd.read_csv('/', error_bad_lines=False)
N_df_3 = pd.read_csv('/', error_bad_lines=False)
N_df_4 = pd.read_csv('/', error_bad_lines=False)
N_df_5 = pd.read_csv('/', error_bad_lines=False)
N_df_6 = pd.read_csv('/', error_bad_lines=False)
N_df_7 = pd.read_csv('/', error_bad_lines=False)
N_df_8 = pd.read_csv('/', error_bad_lines=False)
N_df_9 = pd.read_csv('/', error_bad_lines=False)
N_df_10 = pd.read_csv('/', error_bad_lines=False)

Cleaning data

#Had to combine datasets through concatenation 
N_df = pd.concat([N_df_1, N_df_2, N_df_3, N_df_4,N_df_5 ,N_df_6, N_df_7,N_df_8, N_df_9, N_df_10 ignore_index=False, axis=0)

#Getting rid of all NaN values 
N_df.dropna(axis = 0, how = 'all', inplace = True)

Encoding Categorical data

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(ct.fit_transform(N_df))

Topic features keras feature-engineering dataset data-cleaning

Category Data Science

Mohammad Ahmed · Accepted Answer · 2021年6月24日 20:18

The case is that you are applying the One-Hot Encoding which means it increases the columns with the factor of each variable. suppose you have a binary variable (Male/Female). In actual it is one column in your data file which will look like.

It is actually a one-column but when you make a one-hot encoding of this feature then this would look like this.

So, you have 21 features subtract the 36 - 20 in total you have 16 names in your name variable. So, therefore you are getting shape error.

Training & Test feature shape is different from number of columns in dataset

About