Best Technologies opening Large Sets of Sensor Time-Series Data to Analytics
My team is exploring options to create a robust analytics capability that is well-suited for our large quantities of sensor test data. I'd appreciate any suggestions for technologies that would perform well for my use case.
About my data:
- For each test, we process binary recordings into flat files for each end-user (maybe 5 to 15 files per test, for hundreds of tests per year)
- Each file contains time-series data for 100 to 1000 parameters
- Parameter sample rates are anywhere from 20 samples per second to 10k sps
- Each file contains one or multiple time cuts
- Time cuts might be recorder on to recorder off (~ 2 hours long) or specific shorter events (20-60 seconds on avg.)
- Sets of parameters will share same time array
- Some parameters are continuously-changing (e.g. a temperature measurement), while others rarely change (e.g. a fault code)
Currently, the flat file format we use serves us very well, in terms of compression and performance, and providing quality data to our end users. The format uses RLE to compress repeating values, and time index arrays are shared by multiple parameters (not recreated) as applicable. The format is HDF5 with a specific structure specified.
What technologies would work well to open this data up to data analytics? I'm hoping to maximize efficiency, performance, data compression, and data mining capabilities.
We've experimented with InfluxDB, which has enabled great data mining capability out-of-the-box, but I/O seems pretty slow, and compression does not seem to be very effective (compared to the flat file format).
Thanks in advance for any leads!
Topic data-engineering time-series data-mining
Category Data Science