LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The increasing use of big datasets by analytics applications for higher predictive power leads to higher processing overhead, and the overhead becomes more substantial when datasets are larger than memory capacity. In this paper, we focus on reducing I/O overhead for big data machine learning procedures, including both unsupervised and supervised learning. While I/O data are, in general, not reducible in well-developed applications, our approach to I/O overhead reduction is to overlap I/O's with computations so that when an application is performing an I/O, other useful computation is also processed. To this end, we develop an I/O latency-hiding (LaHiIO) strategy and an enabling easy-to-use API, a wrapper of existing asynchronous I/O operations, by hiding away features not likely needed for general data analytics applications and keeping only those necessary for computation-I/O overlapping. By doing so, we aim to increase the use of computation-I/O overlapping in big data applications by a broad range of developers who could be physicists, chemists, biologists, engineers, but not necessarily system programming experts. We apply the LaHiIO strategy to clustering and neural network procedures, the common choices for unsupervised and supervised learning, resulting in significant performance enhancement from about 10% to 150%, indicating the effectiveness of the LaHiIO strategy and its enabling user-friendly API for big data machine learning applications.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2063-2070
Number of pages8
ISBN (Electronic)9781538650356
DOIs
StatePublished - 2 Jul 2018
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: 10 Dec 201813 Dec 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States
CitySeattle
Period10/12/1813/12/18

Keywords

  • Big Data
  • Computation-I/O Overlapping
  • Machine Learning
  • Non-blocking I/O

Fingerprint

Dive into the research topics of 'LaHiIO: Accelerating Persistent Big Data Machine Learning via Latency Hiding IOs'. Together they form a unique fingerprint.

Cite this