ETAOSD: Static dictionary-based transformation method for text compression

Fadlelmoula Mohamed Baloul, Mohsin Hassan Abdullah, Elsadig Ahmed Babikir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The aim of this paper is to present a new static dictionary-based algorithm for text transformation to increase the data compression ratio when using standard compression tools. The basic idea of the new algorithm is to define a pattern for each word in a static dictionary by replacing all or most of the characters in the words of the dictionary by the most frequently used character in any text file. The proposed algorithm transforms any text file into another encrypted file with a size almost the same as that of the original text file but with different statistical properties. The new transformation method has been designed, implemented, and tested using Gutenburg Corpus. Generally, the output result has shown different levels of enhancements on different common standard data compression tools such as Arithmetic, Huffman, Bzip2, Gzip and WinZip. The compression performance of all common compression tools has been enhanced especially when the patterns of the transformed words passed through costless running length encoding (RLE) algorithm. On using Bzip2, the resultant output files produced about 76.75% as compression ratio with 1.88 as average code length. The final result is very promising and it could be enhanced more in case of applying dynamic dictionary-based text transformation technique.

Original languageEnglish
Title of host publicationProceedings - 2013 International Conference on Computer, Electrical and Electronics Engineering
Subtitle of host publication'Research Makes a Difference', ICCEEE 2013
Pages384-389
Number of pages6
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 1st IEEE International Conference on Computing, Electrical and Electronics Engineering, ICCEEE 2013 - Khartoum, Sudan
Duration: 26 Aug 201328 Aug 2013

Publication series

NameProceedings - 2013 International Conference on Computer, Electrical and Electronics Engineering: 'Research Makes a Difference', ICCEEE 2013

Conference

Conference2013 1st IEEE International Conference on Computing, Electrical and Electronics Engineering, ICCEEE 2013
Country/TerritorySudan
CityKhartoum
Period26/08/1328/08/13

Keywords

  • Average Code Length (ACL)
  • Text Compression
  • Text Preprocessing
  • Text Transformation

Fingerprint

Dive into the research topics of 'ETAOSD: Static dictionary-based transformation method for text compression'. Together they form a unique fingerprint.

Cite this