Difference between DataSampler and TestAndScore widgets in Orange
I'm trying to figure out the difference between using the data sampler to get a 70/30 train/test split and directly using the test and score widget to do so via random sampling. My workflow as follows
This is how my test and score widget looks for the case without the data sampler
and this is how my data sampler widget looks
I see very different results in the confusion matrix at the end between the two. Using the data sampler, I get a much better model than without it. However, if I directly try and leverage the train_test_split function with LogisticRegression in scikit-learn with similar hyper params as Orange(e.g. for solver, C, class_weights etc.) My results there are much closer to what I see in Orange without using data sampler.
Could someone help me figure out what I'm missing?
What is the difference between the two widgets in how I'm using them?
Does the 70% in
DataSamplernot correspond to atrain_test_splitfunction in scikit-learn withtrain_size=0.7?
Category Data Science



