Combining time-series data from different devices

I am using four different sensor models (four different devices) to gather some biometric data(eg. heart rate) for the same user, for the same time period. In an ideal scenario, all the devices are properly positioned, so for the time being we are not concerned about null values and bad readings.

The data follows this format:

timestamp: XXXXXXXXXX //unix epoch timestamp
duration: XXX //duration in seconds
value: XX
state: state //string that characterizes the state of the user based on the value

The devices have quite varied levels of sensitivity, and their individual graphs may contain different "state" sequences, both in terms of the number of appearances of each "state" and their individual timestamps.

I am now tasked with presenting the data into a single graph that combines the data from all devices, and clearly presents the "state" of the user over time. Is there a way to do this without making arbitrary decisions regarding the accuraccy of the devices, or the significance of small duration readings? (I am doing this in Python if that is of any importance)

Topic sensors data visualization dataset

Category Data Science


Assuming that duration is the same across devices, or that it is possible to resample the data such that it is (ex: median across 1 minute intervals).

You can present the numeric data (value) as a line plot, with each sensor values as its own series. You may want to (or not!) normalize the values within each series to make them more comparable.

For the state, map each distinct state to a numerical value. Ex: { "resting": 1, "running": 2, ... }. Then use a point plot with time as X axis, and this numeric state value as Y. Mark the y ticks with the state labels. Each sensor should have its own color.

Compare and contrast

When working with multiple related data series, often the the similarities and dissimilarities between dataseries can be just as informative (or more) than each series in itself.

If the data is supposed to represent the same thing. Then consider computing the median of the value across all sensors, and then for each sensor the difference from group-mean. Plot in a line plot like above.

If you expect agreement in "state", then you can compute the mode across the sensors. And then for each sensor compute when the mode differs. Plot this in a state plot similar to above, but now have "mode" as a series - and for each sensor series only plot values that differ from mode. This can also be represented time-independent as a confusion matrix, either overall or for each sensor.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.