How to input sets as features

Need advice on the best way to represent the below data to be fed into an ML algorithm (yet to decided on)

This is from the online order processing domain. An order consists of a set of variable number of items. Each item can be located in different warehouses, again this is a variable number. The entire order with multiple items and items with multiple warehouses per item, needs to be processed as one training sample. The goal is to learn a function that outputs the warehouses from which the items can be picked under some rules/conditions to minimize processing costs. The number of items can run in millions and stores in 1000's.

I've been looking at representing these as permutation invariant sets - is there a simpler way or is that the right way to go about it ?

Topic feature-engineering data

Category Data Science


In ML you really need good examples and then things for which you don't know the outcome. You learn from the good examples and then apply this "knowledge" to the examples for which you wish to know the outcome.

I agree that Mathematical Optimisation would probably be a better route to take in a problem such as this.

Alternatively, if you want to imply some kind of connection between sets, you could create a categorical (dummy) variable that designates such. If I understand you correctly here is an example

         item1    item2   basket1  basket2  basket3  GroundTruth (target)
order 1  Banana   Apple   True     False    False    warehouse1
order 2  Toy      Carrot  False    True     False    warehouse2
order 3  Picture  Shoe    False    False    True     warehouse3

You could also include the items in the warehouses in a similar way though if you have lots of different items, lots of warehouses, lots of baskets and not relatively enough training example this is going to get pretty sparse pretty quickly.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.