What's the best way to do classification basing on two given datasets (annual data and daily data)?
I want to do binary-classification basing on two given dataset, one is annual statistical data of a company and has the label I should be able to predict like this:
company_id | year | annual sales | something else... | label
0 | 2017 | 2000320 | ... | 0
0 | 2018 | 4002530 | ... | 0
0 | 2019 | 800050 | ... | 1
1 | 2017 | 1024380 | ... | 1
1 | 2018 | 7085521 | ... | 0
1 | 2019 | 4525252 | ... | 0
2 | 2017 | 25258770 | ... | 0
2 | 2018 | 95402000 | ... | 1
2 | 2019 | 8605200 | ... | 0
And the other dataset is daily statistical data of a company:
company_id | year | date(MM-dd) | daily sales | something else...
0 | 2017 | 12-02 | 5210 | ...
0 | 2017 | 12-03 | 3542 | ...
0 | 2017 | 12-04 | 8575 | ...
0 | 2017 | 12-06 | 1254 | ...
0 | 2017 | ... | ... | ...
0 | 2018 | 12-01 | 1352 | ...
0 | 2018 | 12-02 | 4856 | ...
0 | 2018 | ... | ... | ...
0 | 2019 | 12-01 | 4583 | ...
0 | 2019 | ... | ... | ...
1 | 2017 | 12-01 | 5210 | ...
1 | 2017 | ... | ... | ...
1 | 2018 | 12-01 | 5202 | ...
1 | 2018 | ... | ... | ...
1 | 2019 | 12-01 | 8675 | ...
1 | 2019 | ... | ... | ...
I am wondering what's the best way to fully utilize these data to predict the label of each company?
Or is there any related topic I may refer to? I am willing to do some searching on that.
I am considering left join the annual dataset on the daily dataset, but this will result that many rows have the same value in the annual features and the size of dataset grows dramatically.
Topic finance classification data-mining
Category Data Science