Matching new set of data with pre-defined sets

Question

Matching new set of data with pre-defined sets

Georgio Sayegh

2022年4月24日 05:04

I have sets of data describing sets of levels of requirements needed for certain sets of tasks. The following is a tabulated example:

Note that the data values are on a scale from 0 to 10.

My problem here is that I have a set of employees whose skills (analysis, patience, comprehension ...) have been analyzed, like the following employee:

Analysis --> 8.5
Patience --> 5
Comprehension --> 7
Communication --> 7.5
Creativity --> 8

How to match this employee to the best task according to his skillset and the required ones for each task and also find a matching percentage.

Please note that the number of tasks is much bigger (around 1000 tasks) with more requirements that have been found statistically.

Topic data-analysis regression

Category Data Science

rnso · Accepted Answer · 2018年9月18日 16:41

Taking cue from answer by Vincenzo Lavorini, following is the python code for finding task closest to employee's abilities:

import numpy as np

# Let tt be list of tasks and ee be employee: 
tt = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]]
ee = [0.5,0.5,0.5,0.5,0.5]

# convert to numpy array: 
tt = np.array(tt)
ee = np.array(ee)

# find difference between each task and employee:
res = tt - ee
print(res)

Output:

[[-0.5  0.5  1.5  2.5  3.5]
 [ 3.5  2.5  1.5  0.5 -0.5]
 [ 1.5  4.5  2.5  6.5  0.5]
 [ 0.5 -0.5  0.5 -0.5  0.5]]

Find total differences:

res = [np.sum(x) for x in res]
print(res)

Output:

[7.5, 7.5, 15.5, 0.5]

Find index number of task with minimum total difference using numpy.argmin:

print("Task number most suited for this employee:", np.argmin(res))

Output:

Task number most suited for this employee: 3

The result is same as that from scipy spatial code.

Georgio Sayegh · Accepted Answer · 2018年6月20日 10:10

As a follow up to my question and after some research, I have found a programmatic approach using Python on this thread https://stackoverflow.com/questions/32446703/find-closest-vector-from-a-list-of-vectors-python

It basically describes a way to use Python spacial scipy library. The approach is fairly simple, you input a set of vectors, create a KDTree and finally query the tree with an input vector. But one inconvenience arises: the input vector should be of the same length as the other vectors so some pre-processing is required.

The code used is:

>>> from scipy import spatial
>>> A = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]]
>>> tree = spatial.KDTree(A)
>>> tree.query([0.5,0.5,0.5,0.5,0.5]) (1.1180339887498949, 3)

Vincenzo Lavorini · Accepted Answer · 2018年6月15日 15:00

You don't need Machine Learning for doing this.

You can subtract the vector describing the user to the vector of the task, and calculate the magnitude of those vectors.

The vector with the smallest magnitude will come from the most matching task.

Matching new set of data with pre-defined sets

About