In Python, how can I transfer/remove duplicate columns from one dataset, such that the rows and columns of all datasets would be equal?
So I've been trying to improve my Random Decision Tree model for the Titanic Challenge on Kaggle by introducing a Validation Dataset, and now I encounter this roadblock, as shown by the images below:
After inspecting these datasets using the .info
function, I've found that the Validation Dataset contains 178 and 714 non-null floats, while the Test Dataset contains an assorted 178 and 419 non-null floats and integers.
Further, the Datasets contain duplicate rows, which I think should be transferred to the appropriate Dataset and/or removed, but I still don't know the code to do these.
I've spent many hours trying to locate where I messed up in the Advanced Feature Transformation part, but I've come to the conclusion that I cannot overcome this roadblock without asking questions, hence here I am.
Your help and expertise are much appreciated!
This is the link to access my Kaggle Notebook, for more information.
Sincerely,
J.E.
Data Scientist-in-training
Topic data-science-model random-forest classification python
Category Data Science