How is H2O faster than R or SAS?
I am trying to understand the abstract details that explain how h2o is faster than R and SAS for data science computations.
Topic sas r performance bigdata machine-learning
Category Data Science
I am trying to understand the abstract details that explain how h2o is faster than R and SAS for data science computations.
Topic sas r performance bigdata machine-learning
Category Data Science
I have used R, SAS Base and H2O. First, I do not think that H2O seeks to be either R or SAS.
H2O provides data mining algorithms that are highly efficient. You can interface with H2O using several APIs such as their R API. The benefit of combining R and H2O is that H2O is very good at exploiting multi-cores or clusters with minimal effort of the user. It is much harder to achieve the same efficiency in R alone.
The reason why H2O is much faster is that they have a very good indexing of their data and their algorithms are written such that they exploit parallelism to the fullest. See http://h2o.ai/blog/2014/03/h2o-architecture/
R with the default matrix dynamic libraries can only use one CPU core. Revolution R community edition ships with the Intel Math Kernel Library. This allows for some matrix computations in parallel but definitely not as efficient as H2O. For SAS it is a bit harder to say anything considering it's closed source but based on my CPU utilization I would assume that they have a similar approach as Revolution R. Their matrix algebra exploits parallelism but they algorithms are not as efficient as H2O. Their data storage is also not as efficient as H2O.
Lastly, H2O with R comes at a very different price tag than SAS.
Hope this clarifies a bit.
Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.