TY - GEN
T1 - LaHiIO
T2 - 2018 IEEE International Conference on Big Data, Big Data 2018
AU - Aseeri, Ahmad O.
AU - Zhuang, Yu
AU - Alkatheiri, Mohammed Saeed
AU - Thapaliya, Bipana
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.
AB - The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.
KW - Big Data
KW - Computation-I/O Overlapping
KW - Machine Learning
KW - Non-blocking I/O
UR - https://www.scopus.com/pages/publications/85062626233
U2 - 10.1109/BigData.2018.8622181
DO - 10.1109/BigData.2018.8622181
M3 - Conference contribution
AN - SCOPUS:85062626233
T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
SP - 2063
EP - 2070
BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
A2 - Abe, Naoki
A2 - Liu, Huan
A2 - Pu, Calton
A2 - Hu, Xiaohua
A2 - Ahmed, Nesreen
A2 - Qiao, Mu
A2 - Song, Yang
A2 - Kossmann, Donald
A2 - Liu, Bing
A2 - Lee, Kisung
A2 - Tang, Jiliang
A2 - He, Jingrui
A2 - Saltz, Jeffrey
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 December 2018 through 13 December 2018
ER -