Many separation line using RBF kernel in SVM
Below is my code, it take a range of a number, creates a new column label that contains either -1
or 1
.
In case the number is higher than 14000
, we label it with -1 (outlier)
In case the number is lower than 14000
, we label it with 1 (normal)
## Here I just import all the libraries and import the column with my dataset
## Yes, I am trying to find anomalies using only the data from one column
df['label'] = [-1 if x = 14000 else 1 for x in df['data_numbers']] #What I explained above
data = df.drop('label',axis=1)
target = df['label']
outliers = df[df['label']==-1]
outliers = outliers.drop('label',axis=1)
from sklearn.model_selection import train_test_split
train_data, test_data, train_target, test_target = train_test_split(data, target, train_size = 0.8)
train_data.shape
nu = outliers.shape[0] / target.shape[0]
print("nu", nu)
model = svm.OneClassSVM(nu=nu, kernel='rbf', gamma=0.00005)
model.fit(train_data)
from sklearn import metrics
preds = model.predict(train_data)
targs = train_target
print("accuracy: ", metrics.accuracy_score(targs, preds))
print("precision: ", metrics.precision_score(targs, preds))
print("recall: ", metrics.recall_score(targs, preds))
print("f1: ", metrics.f1_score(targs, preds))
print("area under curve (auc): ", metrics.roc_auc_score(targs, preds))
train_preds = preds
preds = model.predict(test_data)
targs = test_target
print("accuracy: ", metrics.accuracy_score(targs, preds))
print("precision: ", metrics.precision_score(targs, preds))
print("recall: ", metrics.recall_score(targs, preds))
print("f1: ", metrics.f1_score(targs, preds))
print("area under curve (auc): ", metrics.roc_auc_score(targs, preds))
test_preds = preds
from mlxtend.plotting import plot_decision_regions # as rbf svm is used hence lot's of decision boundaries are drawn unlike one in linear SVM
# the top one central points with blue quares are outlietrs while at the bottom they are orangy triangles(normal values)
plot_decision_regions(np.array(train_data), np.array(train_target), model)
plt.show()
Output from training data
accuracy: 0.9050484526414505
precision: 0.9974137931034482
recall: 0.907095256762054
f1: 0.9501129131595154
area under curve (auc): 0.5876939698444417
Output from test data
accuracy: 0.9043451078462019
precision: 1.0
recall: 0.9040752351097179
f1: 0.9496213368455713
area under curve (auc): 0.9520376175548589
My graph seems to be having so many sepearation lines, I was thinking I would only be getting one that differentiates between the outliers and the normal data.
Topic kernel anomaly-detection outlier svm python
Category Data Science