ValueError: Data cardinality is ambiguous: (Jupyter Notebook)

I'm building an OCR to read text off of water meters. I'm running into the error mentioned above when I try to fit the machine learning model. I am using the segmentation_models python library.

BACKBONE = 'resnet34'
preprocess_input = sm.get_preprocessing(BACKBONE)



x_train, y_train, x_val, y_val = train_test_split(X,y, test_size = 0.2, random_state= 12345)


x_train = preprocess_input(x_train)
x_val = preprocess_input(x_val)

model = sm.Unet(BACKBONE, encoder_weights='imagenet', encoder_freeze=True)
model.compile('Adam', 
   loss=sm.losses.bce_jaccard_loss,
   metrics=[sm.metrics.iou_score])

model.fit(
   x = x_train,
   y = y_train,
   batch_size=16,
   epochs=10,
   validation_data=(x_val, y_val))

'X' represents the images and 'y' represents the masks; both have 1244 images.

The full error:

ValueError: Data cardinality is ambiguous:
x sizes: 995
y sizes: 249
Make sure all arrays contain the same number of samples.enter code here

Please let me know if I need to post more info. I don't use this platform often.

Topic image-segmentation machine-learning-model python machine-learning

Category Data Science


This error is caused by the fact that you are using the train_test_split incorrectly. The function returns the arrays in the following order: arr1_train, arr1_test, arr2_train, arr2_test etc. This means that in your example you are assigning the values of your X_train dataset to the y_train variable and using it as if they contain the label info. The correct way of using the train_test_split function is to change the order of the variable names in line with the documentation:

x_train, x_val, y_train, y_val = train_test_split(X, y, test_size = 0.2, random_state= 12345)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.