Optimally sample from multiple distributions
I have two datasets both of the form from the table below. I am interested in downselecting from dataset A by sampling from the distribution of values from dataset B. However, I want to consider both the Distance
and Duration
when downselecting such that the distribution of both parameters in my end-product from dataset A matches as best as possible the distribution of these parameters from dataset B.
Anyone have suggestions for tools (preferably in python) that would help me here?
ID | Distance | Duration |
---|---|---|
1 | 5 | 17 |
2 | 9 | 20 |
3 | 2 | 100 |
Category Data Science