Feature extraction; similarity and classification of accelerometer data

I have several expert persons performing the same specific action (for example, squat or leap forward) multiple times. Say 5 persons do 100 squats each. They have an accelerometer attached to the same body parts. I record the accelerometer readings and get 100*5 = 500 data samples. They do it for multiple different actions (squat, push up, leap forward, etc). The way they record the action is as follows:

  1. Start recording (push button)
  2. Do the action
  3. Stop recording (push button)

Now I need to see that another person is doing the actions in the correct order. For example, squat, leap forward, stand up, drop down, push up. I take his accelerometer data and continuously feed it to the classifier that needs to tell me if he now has done exactly a squat action, not a leap or a push-up. So, when the first action, namely squat, was identified, I check against leap forward and so on.

There are several problems with this:

  1. These data samples have different amounts of values, since somebody is squatting a bit slower, others do it a bit faster. So, some data samples have 250 XYZ values, others have 220 or 270, etc. (in range of +-50). What I do for now is make stricter rules. I discard all the data samples that exceed 250 readings and for ones that have fewer values than 250, I append the values from the beginning to the end so that it gives 250 in total. Works fine, since there is a windup for every action where the person is standing still for a brief moment before he performs the action. This is not optimal, because the experts need to redo the action if they were too slow (the windup was too long) + I append fake data. What would be a better solution to handle this?
  2. For now I am using Random Forest, AdaBoost classifiers with low/high pass filtered accelerometer data that I map to 750 columns (250 X, 250 Y, 250 Z) with 1 class column. So the prediction tells me something like 70% leap, 25% squat, 5% push up. The classification is sometimes wrong or not precise enough. Thus, I was thinking of extracting some features from my signal series and feed them to the algorithm instead. My problem is that I do not know what features to extract.

The majority of papers that I found focus on human activity recognition to differentiate between walking, running, ascending, and descending stairs. They were not very helpful in the regard that they have continuous data flow of a person walking/running for hours and they use much more sample data. In contrast, with my task, I have data set instances that are separate from each other.

I am not asking to solve those whole tasks for me, just guide me into some direction with a good explanation of why it might be useful.

Topic sensors classification feature-extraction machine-learning

Category Data Science


Regarding your first problem: I would suggest that you do not discard the the amount of values even if it goes more than or less than 250. Instead what you can do is ; aggregate the values of accelerometer over a time intervals and then tie it up to a single action.

Regarding your second problem: You have around 750 columns of the data. It would be very difficult to use random forest algorithm on it and get higher accuracy on it. You have to apply dimension reduction and then feature extraction techniques. You can go ahead with PCA (principal component analysis) f 750 independent variables. Reduce it down to 2 or 3 variable and check how much variance these reduced variables can explain. if it less than 60%. You can apply T-SNE algorithm to extract features more on it.

P.S. to check if your reduced variable can explain your dependent variables (like squat, sitting, pushup), you can plot the scatter plot of reduce variable values and then color the values based on your dependent variable. you can click below link to understand what I am saying

https://blog.bioturing.com/2018/06/18/how-to-read-pca-biplots-and-scree-plots/


Firstly for every expert you need to create a separate model because activity of one expert is totally different from other expert.

Addressing your first problem :

Appending fake accelerometer data will completely bias the results to one activity because at worst 50 points out of 250 i.e. 20% data is being augmented at your end. Rather you go with the number of points obtained for particular action performed by the expert, without appending fake/augmented data to the obtained accelerometer data. You can discard the points above 250, that will not affect the prediction much.

Addressing your second problem:

You can go for various statistical feature extraction such as (X,Y,Z)max, (X,Y,Z)min, (X,Y,Z)mean, (X,Y,Z)std. You can also use SMA(signal magnitude area) = |X|+|Y|+|Z| - The SMA variable is used to distinguish mobility (activity) and rest period in a time series.

You can validate the correlation between the features and and the activity classes.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.