How to have Multiple labels in a single video?

I am building a Tennis stroke classification system using CNN.

I assume each stroke contains 3 steps/classes ('Ready', 'Impact', 'Finish'). I want to train a model which will predict whether the input video contains these steps/classes in it.

I have tried training 3 models for each step as binary classification.

Example of one step model classes:

1 - ready  
0 - not-ready(other incorrect steps). 

But this method failed since there are more features in 'not-ready' class. I got only 4% accuracy.

Can anyone help me to find a solution for this problem.

Topic cnn classification machine-learning

Category Data Science


Given that you have only 3 classes and that they closely depend on each other, I think it's worth trying a multiclass setting as WBM said. The idea is to label each video using the full combination of actions, since the maximum number of combinations is 2^3 = 8:

  • R-I-F
  • R-I
  • R-F
  • R
  • I-F
  • I
  • F
  • none

Probably some combinations of actions are impossible, so the number of classes is likely less than 8. Why this is a reasonable approach:

  • The setup is exactly the same, i.e. you can use the same labels and the predictions can be used the same way as in your multi-label approach
  • This is a "joint model", i.e. a model which learns everything together and therefore can exploit fine-grained distinction between classes (e.g. between R-I-F and R-I).

However note that this kind of method may require more data, in particular it needs to have enough instances for each class.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.