How does the datasampler widget's cross-validation option of Orange software work?

Question

How does the datasampler widget's cross-validation option of Orange software work?

Veli Özcan Budak

2022年4月28日 16:20

I use a datasampler widget to split a dataset (train and test) with the cross-validation selection. I wonder how it works because some points did not seem clear to me.

Question 1: As seen in the figure, I split the data into five subsets (each has 20 observations). Then, I selected one of the subsets (remaining data) to test the models, which means the four subsets are used for training. At this point, while the algorithms build a model, are the four subsets used one by one for training with cross-validation logic? If so, does the process in the figure correspond to k-fold cross-validation with validation and test? In other meaning, can we state this process: One of the four subsets is selected as an inner test set, while the others are used for training. This training loop lasts until each subset is used as an inner test set. Lastly, the subset, which we selected as the test set (remaining data), is used to evaluate the trained model.

Question 2: As far as I understand, the unused subset, which is selected as 1 in the figure, corresponds to the remaining test data. We have an option to increase this value until the number of subsets. What happens if I select this option as 2. Can we state this process: One of five subsets is already selected as a test set, which means one of 2 unused subsets is determined. The other one unused subset is selected from the rest subsets. And then, only three subsets are used for training with the scenario above.

Question 3: What happens if I select the unused subset as 5.

I will be very glad for your precious answers.

Thank you.

Topic orange cross-validation

Category Data Science

How does the datasampler widget's cross-validation option of Orange software work?

About