In Python, how can I transfer/remove duplicate columns from one dataset, such that the rows and columns of all datasets would be equal?

Question

In Python, how can I transfer/remove duplicate columns from one dataset, such that the rows and columns of all datasets would be equal?

JERE.tech

2022年5月17日 08:48

So I've been trying to improve my Random Decision Tree model for the Titanic Challenge on Kaggle by introducing a Validation Dataset, and now I encounter this roadblock, as shown by the images below:

Validation Dataset

Test Dataset

After inspecting these datasets using the .info function, I've found that the Validation Dataset contains 178 and 714 non-null floats, while the Test Dataset contains an assorted 178 and 419 non-null floats and integers.

Further, the Datasets contain duplicate rows, which I think should be transferred to the appropriate Dataset and/or removed, but I still don't know the code to do these.

I've spent many hours trying to locate where I messed up in the Advanced Feature Transformation part, but I've come to the conclusion that I cannot overcome this roadblock without asking questions, hence here I am.

Your help and expertise are much appreciated!

This is the link to access my Kaggle Notebook, for more information.

Sincerely,

J.E.

Data Scientist-in-training

Topic data-science-model random-forest classification python

Category Data Science

In Python, how can I transfer/remove duplicate columns from one dataset, such that the rows and columns of all datasets would be equal?

About