Bad Input Shape -- How to interpret and Diagnose; Also side ML question
I apologize I am a ML novice, but I am trying to learn. I am making a classifier based on this dataset to predict mental health disorders based on features. I wanted to run a very simple NB classifer model but I keep getting a bad input shape error (I want to feed in features such as age, ethnicity and gender to yield potential diagnoses). Unfortunately, I am having trouble diagnosing where my error is coming from and troubleshooting. Any guidance? (ignore the multiple input stuff at the top; I was trying different things but I am assuming that there is a problem with how I am inputting the data parameters)
Namely, for these labels (diagnoses) I want an output that will show the presence/lack thereof each [0 or 1] based on the features (which are numeric) Feature Names ['YEAR', 'AGE', 'EDUC', 'ETHNIC', 'RACE'] Values [ 9, -9, 4 , 2]
Labels: ['ADHDFLG', 'CONDUCTFLG', 'DELIRDEMFLG', 'BIPOLARFLG', 'DEPRESSFLG', 'ODDFLG', 'PDDFLG', 'PERSONFLG', 'SCHIZOFLG', 'ALCSUBFLG'] Corresponding Label values [0, 1, 0, 0, 0, 1, 0, 0, 0, 0]
Also, side question -- does anyone have any recommendations for other Maching Learning tasks I can try with this? I am doing this for a class and am trying to push myself to learn new topics. Thanks in advance!
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
import scipy
from sklearn.model_selection import train_test_split
df = pd.read_csv(https://csprojectdatavisualizationsample50k.s3.us-east-2.amazonaws.com/sample_df.csv)
df_columns = df.columns
df_feature_names = (df_columns[1:6]).to_list()
df_features = df.iloc[:,2:6].values
df_label_names = (df_columns[26:36]).to_list()
df_labels = df.iloc[:, 26:36].values
#Input
print(df_label_names)
# Split our data
train, test, train_labels, test_labels = train_test_split(df_features,
df_labels,
test_size=0.50,
random_state=42)
print(train.shape)
print(test.shape)
# Initialize our classifier
gnb = GaussianNB()
# Train our classifier
model = gnb.fit(train, train_labels)
# Make predictions
preds = gnb.predict(test)
print(preds)
Topic naive-bayes-classifier machine-learning
Category Data Science