How to use Splitting for startifying in sklearn for multiple files

I have csv data file for binary classification. I divided it into 5 multiple files and tried to apply the stratification technique so the class label has the same proportion for all the files. but I am getting the error

ValueError: Found input variables with inconsistent numbers of samples:

even the whole data is divisible by 5. I think the splitter takes a pandas data frame as input, and I am asking it to stratify by a specific column. The output is a NumPy array that does not have names for columns. how to do this

from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv('C:/data1.csv')
train1, val1 = train_test_split(df , random_state=1, stratify=df['label'])
train2, val2 = train_test_split(train1, test_size=0.20, random_state=1, stratify=df['label'])
train3, val3 = train_test_split(train2, test_size=0.25, random_state=1, stratify=df['label'])
train4, val4 = train_test_split(train3, test_size=0.33, random_state=1, stratify=df['label'])
train5, val5 = train_test_split(train4, test_size=0.50, random_state=1, stratify=df['label'])

val1.to_csv(1.csv, index=False)
val2.to_csv(2.csv, index=False)
val3.to_csv(3.csv, index=False)
val4.to_csv(4.csv, index=False)
val5.to_csv(5.csv, index=False)

Topic sampling cross-validation scikit-learn machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.