Regression with a feature which has its own depth
I'm relatively new to ML/Statistical Analysis, and I'm facing a dataset structured like this
person_id, pay, task, hours
1, 560, A, 3
1, 560, B, 5
2, 650, A, 7
3, 520, C, 6
3, 520, A, 2
...
meaning person 1 is cumulatively paid 560 to perform task A 3 hrs and task B 5 hrs; person 2 paid 650 for task A 7 hrs; person 3 paid 520 for task C 6 hrs and A 2 hrs, etc. I hope it's clear.
I'd like to perform a regression, where my X plane is (task, hours) and Y is the per-person pay, but I haven't figured out yet how to approach such a problem. My tool box would be based on python+scikit-learn, preferably. But a generic discussion would be useful as well.
This is like
person_id, pay, tasks
[1, 560, [[A, 3], [B, 5]]
[2, 650, [[A, 7]]
[3, 520, [[C, 6], [A, 2]]
...
where person_id is a high cardinality feature which can be easily neglected, the Y label is pay, while the tasks (X) feature has its own structure, with fixed shape (2 dimensions here), but not a predetermined depth, although limited in size (maybe 5-10 possible different tasks). I can't understand how to fit this in a regression schema, with such a structured feature data. Should I flatten tasks out, by explicitly having all possible values (A hours, B hours, C hours,... etc) as different columns, or is a more general approach possibile?
Moreover, this is a simplified version of my problem, to make the description simple enough, but it could include even more dimensions in the tasks structure, in which case the number of flattened task features would easily explode, to account for all possible combinations.
Any help welcome and appreciated! Thanks