Choosing between Storm+Trident-ML, Storm+SAMOA or Spark Streaming+MLlib

Question

Choosing between Storm+Trident-ML, Storm+SAMOA or Spark Streaming+MLlib

Raman

2015年11月13日 07:17

I want to implement Streaming Naive Bayes in a distributed system. What are the best approach to choose framework. Should I choose:

Storm alone and implement streaming naive bayes on my own in storm topology.
Storm + TridentML
Storm + SAMOA
Spark Streaming + MLlib

What is the best framework set to choose and start working on. Any suggestion will be of great help.

Topic apache-spark classification distributed data-stream-mining machine-learning

Category Data Science

Luis Claudio Silveira · Accepted Answer · 2015年11月13日 07:17

It depends. If you need a fast way to mine streams of data and use adaptative training of data sets, the best tool is SAMOA, because it could be easily integrated with Storm or S4 stream processing engines. If you need only to process batch data in a fast and distributed manner, the Spark MLLib would be the best solution among them.

Pramit · Accepted Answer · 2015年6月4日 04:05

If I were you, I would pick anyone of the frameworks I am comfortable with and implement the use-case. Spark-Streaming + MLlib should work and would be my choice since its user base is on the rise and it is one of the most popular project under the Apache Umbrella with good enterprise business plan. Both Cloudera and Hortonworks provide enterprise level support. Now, in theory Spark-Streaming lacks behind Storm in stream processing, but the framework is cool in a way that it provides you the option to do streaming, common map and reduce, graph processing and SQL under the same framework. So once you have the pipeline to convert your data to RDD you are good for most of the common jobs related to Data Analysis. It's written from scratch in Scala which is a very powerful language and provides huge scalability in a distributed setup when handling concurrency. Hope this helps, feel free to reach out to me with any questions you have.

Choosing between Storm+Trident-ML, Storm+SAMOA or Spark Streaming+MLlib

About