Does knn extend the train dataset by test values during the prediction?
Lets say I have 100 values in my dataset and split it 80% train 20% test. When predicting the last value, is the prediction based on previous 99 (80 test + 19 already predicted values) or only the original 80 train values?
For example: if kd-tree is used, is every data point inserted into the tree during the prediction?
Is it possible to use knn for the following scenario? I have 20 train values, when I add new observation I classify it and add it to the train dataset so there are 21 values, next time I add new value I classify it based on the 21 values in the dataset. I understand that this is probably not how it should be done, but imagine I am adding up to 50k values so the last one is classified based on the previous 49 999 values.
Another simplified example I came up with. n=2: on pictures 1,2,3 we see points as they were trained and one new green point which will be classified. then we take new observation, are the distances calculated to points as in 4a or as in 4b. link to visualization
Imagine its python sklearn module doing the classification. Up until the picture 1 we called .fit(X_train, y_train), where test dataset had 4 points. Then we called .predict(X_test) which had 2 points.
Topic k-nn supervised-learning classification machine-learning
Category Data Science