On the organization of cluster voting with massive distributed streams

Adi Alhudhaif, Tong Yan, Simon Berkovich

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Data processing is one of the important challenges on Big Data. In this paper we investigate optimal processing algorithm for massive data streams, propose a new processing algorithm called multi-buffer based majority algorithm. The algorithm maintains time complexity of O(n) and selects prevalent elements of frequencies as low as 1%. Our experiments indicate that multi-buffer based majority algorithm has improvements on both accuracy and efficiency. Moreover, we use multibuffer based algorithm to process data streams on single system and distributed system. These experiments indicate that using multi-buffer based algorithm can have better performance on distributed system. Moreover, we give explanations of the experiments' result and indicate several major factors which influence the result accuracy: stream size, element range in the stream, frequency of predominant elements and our buffer sets.

Original languageEnglish
Title of host publicationProceedings - 5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages55-62
Number of pages8
ISBN (Electronic)9781479943210
DOIs
StatePublished - 24 Sep 2014
Externally publishedYes
Event5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014 - Washington, United States
Duration: 4 Aug 20146 Aug 2014

Publication series

NameProceedings - 5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014

Conference

Conference5th International Conference on Computing for Geospatial Research and Application, COM.Geo 2014
Country/TerritoryUnited States
CityWashington
Period4/08/146/08/14

Keywords

  • big data clusterization
  • cloud computing
  • majority algorithm
  • stream processing

Fingerprint

Dive into the research topics of 'On the organization of cluster voting with massive distributed streams'. Together they form a unique fingerprint.

Cite this