Sampling a data based on average and variance of another data
I have a set of textual datasets that have the following average and variance tokens lengths:
Dataset1
avg = 28.18, var = 393.03
Dataset2
avg = 32.70, var = 644.79
Dataset3
avg = 36.94, var = 805.50
Dataset4
avg = 28.56, var = 436.86
Dataset5
avg = 53.13, var = 612.18
How can I sample a smaller set of instances from Dataset5 that is similar (or equal if possible) in terms of avg and var to any of the above datasets?
I am using Pandas dataframes, where each dataset have 2 columns [text, tokens_length].