TY - GEN
T1 - Smart-MLlib
T2 - 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
AU - Siegal, David
AU - Guo, Jia
AU - Agrawal, Gagan
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/6
Y1 - 2016/12/6
N2 - As the popularity of big data analytics has continued to grow, so has the need for accessible and scalable machinelearning implementations. In recent years, Apache Spark's machine-learning library, MLlib, has been used to fulfill this need. Though Spark outperforms Hadoop, it is not clear if it is the best performing underlying middleware to support machine learning implementations. Building on a C++ and MPI based middleware system,-Situ MApReduce liTe (Smart), we present a machine-learning library prototype (Smart-MLlib). Like MLlib, Smart MLlib allows machine learning implementations to be invoked from a Scala program, and with a very similar API. To test our library's performance, we built four machine-learning applications that are also provided in Spark's MLlib: k-means clustering, linear regression, Gaussian mixture models, and support vector machines. On average, we outperformed Spark's MLlib by over 800%. Our library also scaled better than Spark's MLlib for every application tested. Thus, the new machinelearning library enables higher performance than Spark's MLlib without sacrificing the easy-to-use API.
AB - As the popularity of big data analytics has continued to grow, so has the need for accessible and scalable machinelearning implementations. In recent years, Apache Spark's machine-learning library, MLlib, has been used to fulfill this need. Though Spark outperforms Hadoop, it is not clear if it is the best performing underlying middleware to support machine learning implementations. Building on a C++ and MPI based middleware system,-Situ MApReduce liTe (Smart), we present a machine-learning library prototype (Smart-MLlib). Like MLlib, Smart MLlib allows machine learning implementations to be invoked from a Scala program, and with a very similar API. To test our library's performance, we built four machine-learning applications that are also provided in Spark's MLlib: k-means clustering, linear regression, Gaussian mixture models, and support vector machines. On average, we outperformed Spark's MLlib by over 800%. Our library also scaled better than Spark's MLlib for every application tested. Thus, the new machinelearning library enables higher performance than Spark's MLlib without sacrificing the easy-to-use API.
UR - http://www.scopus.com/inward/record.url?scp=85013159142&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013159142&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2016.49
DO - 10.1109/CLUSTER.2016.49
M3 - Conference contribution
AN - SCOPUS:85013159142
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 336
EP - 345
BT - Proceedings - 2016 IEEE International Conference on Cluster Computing, CLUSTER 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 September 2016 through 15 September 2016
ER -