KNN improvements (python)

I rencently had to work on a problem where the best baseline was knn (geolocalised data). I have different targets (binary classification, multiclass classification and regression) and associated metrics, so I use inddiferently knn for classification or regression. This Baseline was easy to implement in Python (sklearn). I was wondering how to improve the baseline. I tried tuning the knn hyperparameters. Optimising k worked a bit, modifying distances didn't work (natural L2 distance worked best by far). Others models gave …
Category: Data Science

Date transformation for KNN

I have data set with date features like 01/01/2019 and I would like to use KNN. However, I cannot find a good transformation for dates that has a meaningful distance result for the last feature. For example: f1 | 1 | 2 | 3 | 4 | 01/01/2019 f2 | 10 | 3 | 12 | 1 | 14/01/2019 Does anyone have any recommendations?
Category: Data Science

How solved "ValueError: y should be a 1d array, got an array of shape () instead."?

from tkinter import * from tkinter import ttk from tkmacosx import Button top = Tk() top.title("Jobs") top.geometry("1000x800") line1 = LabelFrame(top, text='') line1.pack(expand = 'yes', fill = 'both') n = StringVar() categorychoosen = ttk.Combobox(line1, width = 27, textvariable = n) # Adding combobox drop down list categorychoosen['values'] = ('Advocate','Arts','Automation Testing','Blockchain','Business Analyst', 'Web Designing') categorychoosen.place(x=50, y=150) categorychoosen.current() name=Label(line3,text="Welcom to ... company",font =("Arial", 10)) name.place(x=0, y=0) n1 = StringVar() sectionchoosen = ttk.Combobox(line3, width = 27, textvariable = n1) # Adding combobox drop down …
Category: Data Science

Using KNN to categorise inventory (physical stock items) - is it the best way?

I'm working on a machine learning problem involving inventory (i.e. physical retail stock), however through the cleaning (outlier removal) process some of the items (via their corresponding transactions) will be removed. Therefore, I thought of using KNN to group similar items into respective categories. There are 1245 items The info for each item is Average Weighted Price Total Quantity Sold Total Revenue Achieved Min Sold per Transaction Max Sold per Transaction Min Sell Price Max Sell Price Number of Unique …
Category: Data Science

Missing value Imputation in dataset

I have two separate files for Testing and Training. In the training data, I am dropping rows that contain too many missing values . But , In the test data , I cannot afford to drop the rows so I have chosen to impute the missing values using KNN approach . My question is , to impute missing values in the test data using KNN , is it enough to consider only the test data ? As in , neighbors …
Category: Data Science

What ways can i find two similar sets of customers use KNN?

I have a study where i want to find users similar to a set of users (SEED). My data looks like a pivot by customer e.g. sample of SEED looks like (note i drop cust_id): cust_id | spend_food | spend_nike | spend_harrods | 1 | 145 | 45 | 32 | 2 | 85 | 89 | 0 | 4 | 23 | 67 | 1900 | 5 | 84 | 12 | 900 | So to find users similar …
Category: Data Science

Mixed Data Type Classification / Neighbor Algorithm

Here is a hypothetical simplified dataframe of my problem, which would be low dimensional (20ish features), containing some made-up information about certain dog breeds: Breed Min_Weight Max_Weight Min_Height Max_Height is_friendly grp Husky 10 20 30 35 True working Poodle 8 17 15 30 False terrier The algorithm would receive some information about a dog, and it would need to identify k-closest dog breeds based on the input data. It needs to be high performance. Example: algorithm receives an unknown breed …
Category: Data Science

adding a supervising process during knn process

I am trying to improve my KNN regression process (I would like to use sklearn / python, but it doesn't matter).I would like to improve my results and to gain insight. Here is an example: I have data measured from an electric motor: an input voltage (U) and current (I) and an output torque (T) and speed (S). First intend is a simple approach where I'm giving those data in the state to a KNN algorithm and I use the …
Category: Data Science

Does knn extend the train dataset by test values during the prediction?

Lets say I have 100 values in my dataset and split it 80% train 20% test. When predicting the last value, is the prediction based on previous 99 (80 test + 19 already predicted values) or only the original 80 train values? For example: if kd-tree is used, is every data point inserted into the tree during the prediction? Is it possible to use knn for the following scenario? I have 20 train values, when I add new observation I …
Category: Data Science

Item-based recommender using K-NN

I'm trying to build an item-based recommender using k-nn. I have a list of items, all of which have some properties (features) in common. item var_1 var_2 var_3 var_4 var_5 item_1 0.171547232 a 0.908855471 0.292061808 0.285678293 item_2 0.131694336 b 0.432665234 0.501300418 0.756824175 item_3 0.144318764 b 0.238752071 0.487600679 0.203133779 item_4 0.249241125 b 0.921229689 0.003638622 0.606875991 item_5 0.414306046 b 0.190824352 0.937412611 0.1789091 item_6 0.909501131 c 0.847112499 0.548322302 0.060136059 item_7 0.37469644 c 0.282628025 0.211128351 0.125910578 item_8 0.308634676 d 0.174650423 0.705026302 0.440098246 item_9 0.039294192 …
Category: Data Science

Table from results of sknn function (klaR package) won't output

I have a data set with 6 variables that I'm trying to run the sknn function on and then output a table of the results to show k-NN results. I have updated the response variable to a factor to use as row and column headers in the table, and checked the data types of all other variables to make sure they are compatible (int and num). For some reason, no matter what I try, R freezes trying to pull the …
Topic: k-nn r
Category: Data Science

KNN efficient implementation

The KNN algorithm is very handy and particularly suited to some of my problems, but I can't find any resources on how to implement it in production. As a comparative example, when I use a neural network, I already have at my disposal high-level tools allowing me to apply the neural network to examples (either library allowing me to smartly exploit the hardware of my devices when I want to do embedded, or infrastructures allowing me to use my neural …
Category: Data Science

Problems with KNN using tidymodels

I am analyzing a database and I want to perform a KNN. I am using the 'tidymodels' library and when I run the model, I get the following error: All models failed. See the `.notes` column. # Tuning results # 10-fold cross-validation repeated 5 times There were issues with some computations: - Error(s) x1000: Error in check_outcome(): ! For a classification model, the outcome should be a factor. Use collect_notes(object) for more information. The bbdd is composed of the following …
Category: Data Science

How to save a knn model?

I need to save the results of a fit of the SKlearn NearestNeighbors model: knn = NearestNeighbors(10) knn.fit(my_data) How do you save to disk the traied knn using Python?
Category: Data Science

Making Sense of this Error Message

I am using a book and a video to learn how to use KNN method to classify movies according to their genres.This is my code: import numpy as np import pandas as pd r_cols = ['user_id', 'movie_id', 'rating'] ratings = pd.read_csv('C:/Users/dell/Downloads/DataScience/DataScience-Python3/ml-100k/u.data', sep='\t', engine='python', names=r_cols, usecols=range(3)) # The file is u.data from MovieLens print(ratings.head()) movieProperties = ratings.groupby('movie_id').agg({'rating': [np.size, np.mean]}) print(movieProperties.head()) movieNumRatings = pd.DataFrame(movieProperties['rating']['size']) movieNormalizedNumRatings = movieNumRatings.apply(lambda x: (x - np.min(x)) / (np.max(x) - np.min(x))) print(movieNormalizedNumRatings.head()) movieDict = {} with open('C:/Users/dell/Downloads/DataScience/DataScience-Python3/ml-100k/u.item') as …
Category: Data Science

k-Nearest Neighbours with time series data - how to obtain whole-time-period estimators

I have a large dataset for the activities performed by multiple staff in a factory over a long period of time - 01/01/2017 - present. The activities performed by the different staff are recorded at each point in time (since they interact with software). I have tabulated these to record the number of activities performed by each operator for each day. My table looks something like this: Date Name Activity UnitsProcessed Shift Team 01/10/2017 MMouse Soldering 1000 Shift A Team …
Category: Data Science

Recommendations based on other products seen

I am trying to develop a basic book recommender system to get in touch with the field and start learning methods and how to prepare the data. The Dataframe I am using is pretty plain, it has the following structure (this is a simplified example): number type username product publishing_dt price genres 0 34 access kerrigan 130365 2019-12-10 16.99 fantasy, kids 1 1 order kerrigan 76863 2020-01-15 4.66 action, crime 2 1 order 45michael 76863 2020-01-15 4.66 action, crime 3 …
Category: Data Science

New classification in Machine Learning KNN model

This is my example of KNN model (I write it using R): library(gmodels) library(caret) library(class) db_class <- iris row_train <- sample(nrow(db_class),nrow(db_class)*0.8) db_train_x <- db_class[row_train,-ncol(db_class)] db_train_y <- db_class[row_train,ncol(db_class)] db_test_x <- db_class[-row_train,-ncol(db_class)] db_test_y <- db_class[-row_train,ncol(db_class)] model_knn <- knn(db_train_x,db_test_x,db_train_y,12) summary(model_knn) CrossTable(x=db_test_y,y=model_knn,prop.chisq = FALSE) confusionMatrix(data=factor(model_knn),reference=factor(db_test_y)) So, this is a supervised KNN models. How can I classify a new registration? I have this new registration: new_record <- c(5.3,3.2,2.0,0.2) How can I classify it using the previous model?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.