In Python, how can I transfer/remove duplicate columns from one dataset, such that the rows and columns of all datasets would be equal?

So I've been trying to improve my Random Decision Tree model for the Titanic Challenge on Kaggle by introducing a Validation Dataset, and now I encounter this roadblock, as shown by the images below:

Validation Dataset

Test Dataset

After inspecting these datasets using the .info function, I've found that the Validation Dataset contains 178 and 714 non-null floats, while the Test Dataset contains an assorted 178 and 419 non-null floats and integers.

Further, the Datasets contain duplicate rows, which I think should be transferred to the appropriate Dataset and/or removed, but I still don't know the code to do these.

I've spent many hours trying to locate where I messed up in the Advanced Feature Transformation part, but I've come to the conclusion that I cannot overcome this roadblock without asking questions, hence here I am.

Your help and expertise are much appreciated!

This is the link to access my Kaggle Notebook, for more information.

Sincerely,

J.E.

Data Scientist-in-training

Topic data-science-model random-forest classification python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.