Mean Accuracy and Standard Error of the Accuracy for KNN Classification algorithm

The given below code snippet is from the assignment of online course IBM ML with Python. Here's the assignment.

The used variable names :mean_acc and std_acc are ambiguous for me. So, I am thinking from the point of Inferential Statistics but it conflicts.

Ks = 10
mean_acc = np.zeros((Ks-1))
std_acc = np.zeros((Ks-1))

for n in range(1,Ks):
    
    #Train Model and Predict  
    neigh = KNeighborsClassifier(n_neighbors = n).fit(X_train,y_train)
    yhat=neigh.predict(X_test)
    mean_acc[n-1] = metrics.accuracy_score(y_test, yhat)

    
    std_acc[n-1]=np.std(yhat==y_test)/np.sqrt(yhat.shape[0])


Visualisation

plt.plot(range(1,Ks),mean_acc,'g')
plt.fill_between(range(1,Ks),mean_acc - 1 * std_acc,mean_acc + 1 * std_acc, alpha=0.10)
plt.fill_between(range(1,Ks),mean_acc - 3 * std_acc,mean_acc + 3 * std_acc, alpha=0.10,color=green)
plt.legend(('Accuracy ', '+/- 1xstd','+/- 3xstd'))
plt.ylabel('Accuracy ')
plt.xlabel('Number of Neighbors (K)')
plt.tight_layout()
plt.show()

My Doubt Reagrding Used Variables Names mean_acc and std_acc

The mean_acc sounds to me -: Mean accuracy of the Model for the given train and test data size.

The std_acc : Standard Error of the accuracy.

From the knowledge of Inferential Statistics:

Let the size of train data be 800 and test data be 200. Create a Train and Test data, fix a value of K then build a KNN model and record its accuracy(on test data). Reapeat this process(everytime train and test data varies but their size is fixed ) for 200 times(We can create as many models,here we are just creating sample of size = 200).

So, now we have sample of 200 accuracies and we can used this sample to estimate the Mean Accuracy and Standard Error of Accuracy for the model with the given data size.

From the Central Limit Theorem,

(1) The estimate of the mean accuracy of population is given by this sample mean.

(2) The standard error of the accuracy is given by : (standard deviation of sample accuracies)/square root of(sample size).

My understanding and the used variables names are not on the same page ! .

(1) In the code mean_accuracy is reffering to the accuracy of single model.

(2) The standard deviation of accuracy is computed by standard deviation of the correct and incorrect entries(eg: [1,0,0,1,1,1...]) and then divided by sqaure root of number of observations.

I am not able to get the used code and terminologies. Could someone help me to get a clearity about code and its interpretation?

Topic estimation k-nn python statistics machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.