LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs

Ahmad O. Aseeri; Yu Zhuang; Mohammed Saeed Alkatheiri; Bipana Thapaliya

doi:10.1109/BigData.2018.8622181

LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs

Ahmad O. Aseeri
, Yu Zhuang
, Mohammed Saeed Alkatheiri
, Bipana Thapaliya

Computer Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.

Original language	English
Title of host publication	Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
Editors	Naoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	2063-2070
Number of pages	8
ISBN (Electronic)	9781538650356
DOIs	https://doi.org/10.1109/BigData.2018.8622181
State	Published - 2 Jul 2018
Event	2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States Duration: 10 Dec 2018 → 13 Dec 2018

Publication series

Name	Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference	2018 IEEE International Conference on Big Data, Big Data 2018
Country/Territory	United States
City	Seattle
Period	10/12/18 → 13/12/18

Keywords

Big Data
Computation-I/O Overlapping
Machine Learning
Non-blocking I/O

Access to Document

10.1109/BigData.2018.8622181

Cite this

Aseeri, A. O., Zhuang, Y., Alkatheiri, M. S., & Thapaliya, B. (2018). LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs. In N. Abe, H. Liu, C. Pu, X. Hu, N. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, & J. Saltz (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 2063-2070). Article 8622181 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8622181

Aseeri, Ahmad O. ; Zhuang, Yu ; Alkatheiri, Mohammed Saeed et al. / LaHiIO : Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs. Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. editor / Naoki Abe ; Huan Liu ; Calton Pu ; Xiaohua Hu ; Nesreen Ahmed ; Mu Qiao ; Yang Song ; Donald Kossmann ; Bing Liu ; Kisung Lee ; Jiliang Tang ; Jingrui He ; Jeffrey Saltz. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 2063-2070 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

@inproceedings{7ba7aa62b8fb42caad44ab9b39f5c0d9,

title = "LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs",

abstract = "The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10\% to 150\%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.",

keywords = "Big Data, Computation-I/O Overlapping, Machine Learning, Non-blocking I/O",

author = "Aseeri, \{Ahmad O.\} and Yu Zhuang and Alkatheiri, \{Mohammed Saeed\} and Bipana Thapaliya",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 2018 IEEE International Conference on Big Data, Big Data 2018 ; Conference date: 10-12-2018 Through 13-12-2018",

year = "2018",

month = jul,

day = "2",

doi = "10.1109/BigData.2018.8622181",

language = "English",

series = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2063--2070",

editor = "Naoki Abe and Huan Liu and Calton Pu and Xiaohua Hu and Nesreen Ahmed and Mu Qiao and Yang Song and Donald Kossmann and Bing Liu and Kisung Lee and Jiliang Tang and Jingrui He and Jeffrey Saltz",

booktitle = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",

address = "United States",

}

Aseeri, AO, Zhuang, Y, Alkatheiri, MS & Thapaliya, B 2018, LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs. in N Abe, H Liu, C Pu, X Hu, N Ahmed, M Qiao, Y Song, D Kossmann, B Liu, K Lee, J Tang, J He & J Saltz (eds), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018., 8622181, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, Institute of Electrical and Electronics Engineers Inc., pp. 2063-2070, 2018 IEEE International Conference on Big Data, Big Data 2018, Seattle, United States, 10/12/18. https://doi.org/10.1109/BigData.2018.8622181

LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs. / Aseeri, Ahmad O.; Zhuang, Yu; Alkatheiri, Mohammed Saeed et al.
Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. ed. / Naoki Abe; Huan Liu; Calton Pu; Xiaohua Hu; Nesreen Ahmed; Mu Qiao; Yang Song; Donald Kossmann; Bing Liu; Kisung Lee; Jiliang Tang; Jingrui He; Jeffrey Saltz. Institute of Electrical and Electronics Engineers Inc., 2018. p. 2063-2070 8622181 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - LaHiIO

T2 - 2018 IEEE International Conference on Big Data, Big Data 2018

AU - Aseeri, Ahmad O.

AU - Zhuang, Yu

AU - Alkatheiri, Mohammed Saeed

AU - Thapaliya, Bipana

PY - 2018/7/2

Y1 - 2018/7/2

N2 - The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.

AB - The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.

KW - Big Data

KW - Computation-I/O Overlapping

KW - Machine Learning

KW - Non-blocking I/O

UR - https://www.scopus.com/pages/publications/85062626233

U2 - 10.1109/BigData.2018.8622181

DO - 10.1109/BigData.2018.8622181

M3 - Conference contribution

AN - SCOPUS:85062626233

T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

SP - 2063

EP - 2070

BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

A2 - Abe, Naoki

A2 - Liu, Huan

A2 - Pu, Calton

A2 - Hu, Xiaohua

A2 - Ahmed, Nesreen

A2 - Qiao, Mu

A2 - Song, Yang

A2 - Kossmann, Donald

A2 - Liu, Bing

A2 - Lee, Kisung

A2 - Tang, Jiliang

A2 - He, Jingrui

A2 - Saltz, Jeffrey

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 10 December 2018 through 13 December 2018

ER -

Aseeri AO, Zhuang Y, Alkatheiri MS, Thapaliya B. LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs. In Abe N, Liu H, Pu C, Hu X, Ahmed N, Qiao M, Song Y, Kossmann D, Liu B, Lee K, Tang J, He J, Saltz J, editors, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 2063-2070. 8622181. (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). doi: 10.1109/BigData.2018.8622181

LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this