Performance Analysis of Huffman and Arithmetic coding Compression algorithms 1 By: Ramakant Soni
Data Compression • It is the process of encoding information using fewer bits than the original representation. • It is the process of reducing the size of a data file, although its formal name is source coding. Compression types: 1. Lossless compression 2. Lossy compression • Compression is useful because it helps reduce resources usage, such as data storage space or transmission capacity. • Data compression is subject to a space-time complexity trade-off. 2
Need of analysis • Which compression algorithm is Better? • Decision Factor: Complexity Complexity (Computational complexity ) theory focuses on classifying computational problems according to their inherent difficulty. Difficulty of problems mean the resources used to get the solution. Resources: 1. Space 2. Time 3
Compression Algorithms for analysis • Huffman Coding • Arithmetic Coding 4
Huffman Coding • Huffman coding is an entropy encoding algorithm used for lossless data compression. • It encodes a source symbol into variable-length code which is derived in a particular way based on the estimated probability of occurrence of the source symbol. 5
Huffman Coding Algorithm: Algorithm: 1. Start with a list of symbols and their frequency in the alphabet. 2. Select two symbols with the lowest frequency. 3. Add their frequencies and reduce the symbols. 4. Repeat the process starting from step-2 until only two values remain. 5. The algorithm creates a prefix code for each symbol from the alphabet simply by traversing the symbols back. It assigns 0 and 1 for each frequency value in each phase . 6
Example: Huffman Coding Input: Six symbols and their respective Probability / Frequency Values Steps: 1. Arrange the symbols in descending order of frequency values. 2. Add the two lowest values and remove the symbols 3. At the end we will be left with only two values 7
Example: Huffman Coding Steps: 1. Moving backward assign the bits to the values. 2. With each succession phase(reverse) add bit values using step 1 Result: Symbols with variable length codes. 8
Huffman Coding Complexity analysis • Huffman algorithm is based on greedy approach to get the optimal result(code length). • Using the pseudo code: • Running Time Function: T(n)= N[n+ log(2n-1)]+ Sn • Complexity: O(N ) 9 2
Arithmetic Coding • Arithmetic coding is a form of entropy encoding used in lossless data compression. • A string of characters is represented using a fixed number of bits per character. • When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits. • Arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) 10
Arithmetic Coding Algorithm: Algorithm: 1. Take character and symbol with their probabilities value as input. 2. The encoder divides the current interval [0,1) into sub-intervals, each representing a fraction of the current interval proportional to the probability of that symbol in the current context. 3. Whichever interval corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step. 4. When symbols get encoded, the resulting interval unambiguously identifies the sequence of symbols that produced it. 5. Using the final interval and model we can reconstruct the symbol sequence that must have entered the encoder to result in that final interval. 11
Example: Arithmetic Coding Input: 3 symbols(x) with respective probabilities(p) and a character X={0,1,2} P={P0, P1, P2}={0.2, 0.4, 0.4 } x1..x3=210 12
Arithmetic Coding Complexity analysis ARITHMETIC_IDEAL_ENCODE(M) Set L= 0 and R = 1 FOR i = 1 to |M| DO END FOR Transmit the shortest binary fractional number that lies in the interval [L, L+R) END 13
Arithmetic Coding Complexity analysis Running time Function: T(n)=N[log n+ a]+ Sn N Number of input symbols n current number of unique symbols S  Time to maintain internal data structure Complexity: O(N log n) 14
Complexity comparison 15
Conclusion • Arithmetic Coding surpasses the Huffman technique in its compression ability. • The Huffman method assigns an integral number of bits to each symbol, while arithmetic coding assigns one long code to the entire input string. • Arithmetic coding consists of a few arithmetic operations due to which its complexity is less. • In terms of complexity Arithmetic coding is asymptotically better than Huffman’s. 16
Thank You 17

Huffman and Arithmetic coding - Performance analysis

  • 1.
    Performance Analysis ofHuffman and Arithmetic coding Compression algorithms 1 By: Ramakant Soni
  • 2.
    Data Compression • Itis the process of encoding information using fewer bits than the original representation. • It is the process of reducing the size of a data file, although its formal name is source coding. Compression types: 1. Lossless compression 2. Lossy compression • Compression is useful because it helps reduce resources usage, such as data storage space or transmission capacity. • Data compression is subject to a space-time complexity trade-off. 2
  • 3.
    Need of analysis •Which compression algorithm is Better? • Decision Factor: Complexity Complexity (Computational complexity ) theory focuses on classifying computational problems according to their inherent difficulty. Difficulty of problems mean the resources used to get the solution. Resources: 1. Space 2. Time 3
  • 4.
    Compression Algorithms foranalysis • Huffman Coding • Arithmetic Coding 4
  • 5.
    Huffman Coding • Huffmancoding is an entropy encoding algorithm used for lossless data compression. • It encodes a source symbol into variable-length code which is derived in a particular way based on the estimated probability of occurrence of the source symbol. 5
  • 6.
    Huffman Coding Algorithm: Algorithm: 1.Start with a list of symbols and their frequency in the alphabet. 2. Select two symbols with the lowest frequency. 3. Add their frequencies and reduce the symbols. 4. Repeat the process starting from step-2 until only two values remain. 5. The algorithm creates a prefix code for each symbol from the alphabet simply by traversing the symbols back. It assigns 0 and 1 for each frequency value in each phase . 6
  • 7.
    Example: Huffman Coding Input:Six symbols and their respective Probability / Frequency Values Steps: 1. Arrange the symbols in descending order of frequency values. 2. Add the two lowest values and remove the symbols 3. At the end we will be left with only two values 7
  • 8.
    Example: Huffman Coding Steps: 1.Moving backward assign the bits to the values. 2. With each succession phase(reverse) add bit values using step 1 Result: Symbols with variable length codes. 8
  • 9.
    Huffman Coding Complexityanalysis • Huffman algorithm is based on greedy approach to get the optimal result(code length). • Using the pseudo code: • Running Time Function: T(n)= N[n+ log(2n-1)]+ Sn • Complexity: O(N ) 9 2
  • 10.
    Arithmetic Coding • Arithmeticcoding is a form of entropy encoding used in lossless data compression. • A string of characters is represented using a fixed number of bits per character. • When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits. • Arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) 10
  • 11.
    Arithmetic Coding Algorithm: Algorithm: 1.Take character and symbol with their probabilities value as input. 2. The encoder divides the current interval [0,1) into sub-intervals, each representing a fraction of the current interval proportional to the probability of that symbol in the current context. 3. Whichever interval corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step. 4. When symbols get encoded, the resulting interval unambiguously identifies the sequence of symbols that produced it. 5. Using the final interval and model we can reconstruct the symbol sequence that must have entered the encoder to result in that final interval. 11
  • 12.
    Example: Arithmetic Coding Input:3 symbols(x) with respective probabilities(p) and a character X={0,1,2} P={P0, P1, P2}={0.2, 0.4, 0.4 } x1..x3=210 12
  • 13.
    Arithmetic Coding Complexityanalysis ARITHMETIC_IDEAL_ENCODE(M) Set L= 0 and R = 1 FOR i = 1 to |M| DO END FOR Transmit the shortest binary fractional number that lies in the interval [L, L+R) END 13
  • 14.
    Arithmetic Coding Complexityanalysis Running time Function: T(n)=N[log n+ a]+ Sn N Number of input symbols n current number of unique symbols S  Time to maintain internal data structure Complexity: O(N log n) 14
  • 15.
  • 16.
    Conclusion • Arithmetic Codingsurpasses the Huffman technique in its compression ability. • The Huffman method assigns an integral number of bits to each symbol, while arithmetic coding assigns one long code to the entire input string. • Arithmetic coding consists of a few arithmetic operations due to which its complexity is less. • In terms of complexity Arithmetic coding is asymptotically better than Huffman’s. 16
  • 17.