Training & Test feature shape is different from number of columns in dataset
I am making a Sequential Neural Network for regression with 3 dense layers which will be trained on a simple dataset. But before I even get to that part of the code to execute the model I am getting a different shape of my features than columns in dataset. Columns of the dataset includes:
- one categorical Name column which is one-hot encoded
2)the other 20 columns are integers/floats
I have 21 features in my dataset. ValueError is telling me it is expecting 36 but there are only 21 When I check the shape using X.shape for my dataset it is telling me the shape is (98,36). My dataset has 98 rows x 21 columns. There are only 21 features in my dataset. How is it getting a shape of 36 ?
I am consequently receiving this error of course when I try to run my Keras model
Error ValueError: Input 0 of layer sequential_1 is incompatible with the layer: expected axis -1 of input shape to have value 21 but received input with shape (None, 36)
Here is my code when I import and clean the dataset
Import Dataset
N_df_1 = pd.read_csv('/', error_bad_lines=False) #I can't show dataset paths
N_df_2 = pd.read_csv('/', error_bad_lines=False)
N_df_3 = pd.read_csv('/', error_bad_lines=False)
N_df_4 = pd.read_csv('/', error_bad_lines=False)
N_df_5 = pd.read_csv('/', error_bad_lines=False)
N_df_6 = pd.read_csv('/', error_bad_lines=False)
N_df_7 = pd.read_csv('/', error_bad_lines=False)
N_df_8 = pd.read_csv('/', error_bad_lines=False)
N_df_9 = pd.read_csv('/', error_bad_lines=False)
N_df_10 = pd.read_csv('/', error_bad_lines=False)
Cleaning data
#Had to combine datasets through concatenation
N_df = pd.concat([N_df_1, N_df_2, N_df_3, N_df_4,N_df_5 ,N_df_6, N_df_7,N_df_8, N_df_9, N_df_10 ignore_index=False, axis=0)
#Getting rid of all NaN values
N_df.dropna(axis = 0, how = 'all', inplace = True)
Encoding Categorical data
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(ct.fit_transform(N_df))
Topic features keras feature-engineering dataset data-cleaning
Category Data Science