Matching new set of data with pre-defined sets

I have sets of data describing sets of levels of requirements needed for certain sets of tasks. The following is a tabulated example:

Note that the data values are on a scale from 0 to 10.

My problem here is that I have a set of employees whose skills (analysis, patience, comprehension ...) have been analyzed, like the following employee:

  • Analysis --> 8.5
  • Patience --> 5
  • Comprehension --> 7
  • Communication --> 7.5
  • Creativity --> 8

How to match this employee to the best task according to his skillset and the required ones for each task and also find a matching percentage.

Please note that the number of tasks is much bigger (around 1000 tasks) with more requirements that have been found statistically.

Topic data-analysis regression

Category Data Science


Taking cue from answer by Vincenzo Lavorini, following is the python code for finding task closest to employee's abilities:

import numpy as np

# Let tt be list of tasks and ee be employee: 
tt = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]]
ee = [0.5,0.5,0.5,0.5,0.5]

# convert to numpy array: 
tt = np.array(tt)
ee = np.array(ee)

# find difference between each task and employee:
res = tt - ee
print(res)

Output:

[[-0.5  0.5  1.5  2.5  3.5]
 [ 3.5  2.5  1.5  0.5 -0.5]
 [ 1.5  4.5  2.5  6.5  0.5]
 [ 0.5 -0.5  0.5 -0.5  0.5]]

Find total differences:

res = [np.sum(x) for x in res]
print(res)

Output:

[7.5, 7.5, 15.5, 0.5]

Find index number of task with minimum total difference using numpy.argmin:

print("Task number most suited for this employee:", np.argmin(res))

Output:

Task number most suited for this employee: 3

The result is same as that from scipy spatial code.


As a follow up to my question and after some research, I have found a programmatic approach using Python on this thread https://stackoverflow.com/questions/32446703/find-closest-vector-from-a-list-of-vectors-python

It basically describes a way to use Python spacial scipy library. The approach is fairly simple, you input a set of vectors, create a KDTree and finally query the tree with an input vector. But one inconvenience arises: the input vector should be of the same length as the other vectors so some pre-processing is required.

The code used is:

>>> from scipy import spatial
>>> A = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]]
>>> tree = spatial.KDTree(A)
>>> tree.query([0.5,0.5,0.5,0.5,0.5]) (1.1180339887498949, 3)

You don't need Machine Learning for doing this.

You can subtract the vector describing the user to the vector of the task, and calculate the magnitude of those vectors.

The vector with the smallest magnitude will come from the most matching task.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.