LZW Data Compression Efficient Compression Algorithm for Lossless Data Compression Abdulaziz Awwad Aljuaid 44108981 Supervised by Prof. Dr. Hatem Ghazi Zaini October 12, 2024
2 Outline  Introduction to Data Compression  History of LZW Compression  Basic Concepts of LZW Compression  LZW Compression Process  LZW Compression Example  LZW Decompression Process  LZW Decompression Example  Advantages of LZW Over Huffman  Limitations of LZW Algorithm  Applications of LZW  Conclusion  References
3 Introduction to Data Compression What is Data Compression, at first?  A process of reducing the amount of data “represented by bits” needed for the storage or transmission of a given piece of information.  Data compression is important in storing information digitally on computer disks and in transmitting it over communication channels.  Data compression is a necessary process in the digital world as the size of data increases over the time, more and more data are required to be transmitted efficiently at lower bandwidth by utilizing the payload with compressions methods.
4 Introduction to Data Compression Data Compression Category  Compressing data has a variety of methods for compressing mechanism to work.  Generally these methods categorized as either Lossless and Lossy compression.  Lossless “exact” compression recovers and rebuilds the data in its original form after decompression “reversal of compressing”.  Whilst lossy “inexact” does not restore data to its original shape.  Lossless is used mostly for text where every character is need, while lossy used for images, voice etc.
5 History of LZW Compression  LZ77  Original Lempel Ziv approach to data compression was first published 1977.  The LZ77 compression algorithms are commonly found in text compression and archiving programs, such as compress in Unix system.  Authored by Abraham Lempel and Jacob Ziv.  LZ78  Followed by an alternate approach in 1978  Improved version of LZ77 by using dictionary-based approach.  More commonly used to compress binary data, such as bitmaps.  LZW  In 1984, while working in Unisys, Terry Welch modified LZ78 compressor for implementation in high-performance disk controllers.  Resulted a surprisingly simple general algorithm.
6 Basic Concepts of LZW Compression  Dictionary-based algorithm  LZW is referred to as a substitutional or dictionary based encoding algorithm.  It builds a data dictionary (also called translation table or string table) of a data occurring in an uncompressed data streams.  Patterns of data (substrings) are identified in the data stream and are matched to entries in the dictionary.  Dictionary growth  Dynamic dictionary is created by both compressor and decompressor with small dictionary of all possible single-character strings like ASCII characters.  No dictionary transmission which reduces transmission overhead.
7 LZW Compression Process The process 1. Initialize an empty dictionary. 2. Populate the emptied dictionary with all possible character from the data stream. 3. Initialize an empty string P. 4. Do the iterative process until the end of the character stream: 1. C next character ← 2. If P + C exists in the dictionary 1. P P + C ← (Extend by adding C to the string P) 3. Otherwise 1. Output the code for P 2. Append the string P + C to the dictionary 3. P C ← (Replace the string P with the current character)
8 LZW Compression Example (1)
9 LZW Compression Example (2)
10 LZW Compression Example (3)
11 LZW Compression Example (4)
12 LZW Compression Example (5)
13 LZW Compression Example (6)
14 LZW Compression Example (7)
15 LZW Compression Example (8)
16 LZW Compression Example (9)
17 LZW Decompression Process The process 1. Initialize the dictionary with all possible characters. 2. Decode first index of W 3. Append W? in the dictionary 4. Repeat 1. Decode the first symbol S of the index 2. Complete the previous dictionary entry with S 3. Append W? in the where W was just decoded
18 LZW Decompression Example (1) Dictionary 0 1 2 4 3 6 0 a 1 b 2 a? a
19 LZW Decompression Example (2a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab a b
20 LZW Decompression Example (2b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 b? a b
21 LZW Decompression Example (3a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba a b a
22 LZW Decompression Example (3b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 ab? a b ab
23 LZW Decompression Example (4a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba a b ab a
24 LZW Decompression Example (4b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 aba? a b ab aba
25 LZW Decompression Example (5a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab a b ab aba b
26 LZW Decompression Example (5b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 ba? a b ab aba ba
27 LZW Decompression Example (6a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 bab a b ab aba ba b
28 LZW Decompression Example (6b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 bab 7 bab? a b ab aba ba bab
29 Advantages of LZW Over Huffman  The algorithm is straightforward, simple and effective.  LZW does not require any prior knowledge of the input data stream unlike Huffman.  It has 60-70% compression ratio for huge text files.  It performs better for files with a lot of repetitive data.  It compresses the data in a single pass.
30 Limitations of LZW Algorithm  LZW has a fixed dictionary size usually limited for 4096 entries “4KB”.  Inefficient for small files.  Slower compression for complex data if there is little repetition.  Larger dictionaries require significant memory so it increases devices usage overhead.
31 Applications of LZW  File Compression  Used widely in file compression like ZIP, GIF.  Image Compression  It is the core component for GIF image format.  PDF  Compressing document contents.  Data Archiving  Used in various archival tools like compress utility tool in Unix system.  Telecommunications  LZW is applied in modem protocols like V.42bis.
32 Conclusion  Lempel–Ziv–Welch (LZW) Algorithm is a common lossless data compression algorithm.  Abraham Lempel, Jakob Ziv, and Terry Welch developed the LZW compression algorithm.  Lossless compression algorithms reduce file size without hampering any information in the file.  Lempel–Ziv–Welch (LZW) algorithm is used in the GIF image format and is part of the widely used Unix file compression utility compress.  LZW is a “dictionary-based” lossless compression algorithm that scans a file for data patterns that appear more than once.  The LZW Algorithm is divided into two parts: the encoding algorithm, which converts strings to integer codes, and the decoding algorithm, which does the opposite.  We should not prefer the LZW algorithm for compression when there is no repetitive data.
33 References [1] Data Compression on Britannica. Link [2] Data Compression on ScienceDirect. Link [3] LZW Data Compression, Mark Nelson, 1989. Link Via Internet Archive [4] Lempel-Ziv-Welch (LZW) Compression, 2007. Link Via Internet Archive [5] LZW – compression, The Prague Stringology Club. Link [6] Data Compression on TechTarget. Link

LZW Data Compression: Efficient Compression Algorithm for Lossless Data Compression

  • 1.
    LZW Data Compression EfficientCompression Algorithm for Lossless Data Compression Abdulaziz Awwad Aljuaid 44108981 Supervised by Prof. Dr. Hatem Ghazi Zaini October 12, 2024
  • 2.
    2 Outline  Introduction toData Compression  History of LZW Compression  Basic Concepts of LZW Compression  LZW Compression Process  LZW Compression Example  LZW Decompression Process  LZW Decompression Example  Advantages of LZW Over Huffman  Limitations of LZW Algorithm  Applications of LZW  Conclusion  References
  • 3.
    3 Introduction to DataCompression What is Data Compression, at first?  A process of reducing the amount of data “represented by bits” needed for the storage or transmission of a given piece of information.  Data compression is important in storing information digitally on computer disks and in transmitting it over communication channels.  Data compression is a necessary process in the digital world as the size of data increases over the time, more and more data are required to be transmitted efficiently at lower bandwidth by utilizing the payload with compressions methods.
  • 4.
    4 Introduction to DataCompression Data Compression Category  Compressing data has a variety of methods for compressing mechanism to work.  Generally these methods categorized as either Lossless and Lossy compression.  Lossless “exact” compression recovers and rebuilds the data in its original form after decompression “reversal of compressing”.  Whilst lossy “inexact” does not restore data to its original shape.  Lossless is used mostly for text where every character is need, while lossy used for images, voice etc.
  • 5.
    5 History of LZWCompression  LZ77  Original Lempel Ziv approach to data compression was first published 1977.  The LZ77 compression algorithms are commonly found in text compression and archiving programs, such as compress in Unix system.  Authored by Abraham Lempel and Jacob Ziv.  LZ78  Followed by an alternate approach in 1978  Improved version of LZ77 by using dictionary-based approach.  More commonly used to compress binary data, such as bitmaps.  LZW  In 1984, while working in Unisys, Terry Welch modified LZ78 compressor for implementation in high-performance disk controllers.  Resulted a surprisingly simple general algorithm.
  • 6.
    6 Basic Concepts ofLZW Compression  Dictionary-based algorithm  LZW is referred to as a substitutional or dictionary based encoding algorithm.  It builds a data dictionary (also called translation table or string table) of a data occurring in an uncompressed data streams.  Patterns of data (substrings) are identified in the data stream and are matched to entries in the dictionary.  Dictionary growth  Dynamic dictionary is created by both compressor and decompressor with small dictionary of all possible single-character strings like ASCII characters.  No dictionary transmission which reduces transmission overhead.
  • 7.
    7 LZW Compression Process Theprocess 1. Initialize an empty dictionary. 2. Populate the emptied dictionary with all possible character from the data stream. 3. Initialize an empty string P. 4. Do the iterative process until the end of the character stream: 1. C next character ← 2. If P + C exists in the dictionary 1. P P + C ← (Extend by adding C to the string P) 3. Otherwise 1. Output the code for P 2. Append the string P + C to the dictionary 3. P C ← (Replace the string P with the current character)
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    17 LZW Decompression Process Theprocess 1. Initialize the dictionary with all possible characters. 2. Decode first index of W 3. Append W? in the dictionary 4. Repeat 1. Decode the first symbol S of the index 2. Complete the previous dictionary entry with S 3. Append W? in the where W was just decoded
  • 18.
    18 LZW Decompression Example(1) Dictionary 0 1 2 4 3 6 0 a 1 b 2 a? a
  • 19.
    19 LZW Decompression Example(2a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab a b
  • 20.
    20 LZW Decompression Example(2b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 b? a b
  • 21.
    21 LZW Decompression Example(3a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba a b a
  • 22.
    22 LZW Decompression Example(3b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 ab? a b ab
  • 23.
    23 LZW Decompression Example(4a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba a b ab a
  • 24.
    24 LZW Decompression Example(4b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 aba? a b ab aba
  • 25.
    25 LZW Decompression Example(5a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab a b ab aba b
  • 26.
    26 LZW Decompression Example(5b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 ba? a b ab aba ba
  • 27.
    27 LZW Decompression Example(6a) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 bab a b ab aba ba b
  • 28.
    28 LZW Decompression Example(6b) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 bab 7 bab? a b ab aba ba bab
  • 29.
    29 Advantages of LZWOver Huffman  The algorithm is straightforward, simple and effective.  LZW does not require any prior knowledge of the input data stream unlike Huffman.  It has 60-70% compression ratio for huge text files.  It performs better for files with a lot of repetitive data.  It compresses the data in a single pass.
  • 30.
    30 Limitations of LZWAlgorithm  LZW has a fixed dictionary size usually limited for 4096 entries “4KB”.  Inefficient for small files.  Slower compression for complex data if there is little repetition.  Larger dictionaries require significant memory so it increases devices usage overhead.
  • 31.
    31 Applications of LZW File Compression  Used widely in file compression like ZIP, GIF.  Image Compression  It is the core component for GIF image format.  PDF  Compressing document contents.  Data Archiving  Used in various archival tools like compress utility tool in Unix system.  Telecommunications  LZW is applied in modem protocols like V.42bis.
  • 32.
    32 Conclusion  Lempel–Ziv–Welch (LZW)Algorithm is a common lossless data compression algorithm.  Abraham Lempel, Jakob Ziv, and Terry Welch developed the LZW compression algorithm.  Lossless compression algorithms reduce file size without hampering any information in the file.  Lempel–Ziv–Welch (LZW) algorithm is used in the GIF image format and is part of the widely used Unix file compression utility compress.  LZW is a “dictionary-based” lossless compression algorithm that scans a file for data patterns that appear more than once.  The LZW Algorithm is divided into two parts: the encoding algorithm, which converts strings to integer codes, and the decoding algorithm, which does the opposite.  We should not prefer the LZW algorithm for compression when there is no repetitive data.
  • 33.
    33 References [1] Data Compressionon Britannica. Link [2] Data Compression on ScienceDirect. Link [3] LZW Data Compression, Mark Nelson, 1989. Link Via Internet Archive [4] Lempel-Ziv-Welch (LZW) Compression, 2007. Link Via Internet Archive [5] LZW – compression, The Prague Stringology Club. Link [6] Data Compression on TechTarget. Link