CS 753 DIP
Module 5 Image Compression
CS 414 - Spring 2011
Outline
• Statistical Entropy Coding
– Huffman coding
– Arithmetic coding
CS 414 - Spring 2011
Huffman Encoding
• Statistical encoding
• To determine Huffman code, it is useful
to construct a binary tree
• Leaves are characters to be encoded
• Nodes carry occurrence probabilities of
the characters belonging to the sub-tree
CS 414 - Spring 2011
Huffman Encoding (Example)
Step 1 : Sort all Symbols according to their probabilities
(left to right) from Smallest to largest
these are the leaves of the Huffman tree
P(B) = 0.51
P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16
CS 414 - Spring 2011
Huffman Encoding (Example)
Step 2: Build a binary tree from left to
Right P(CEDAB) = 1
Policy: always connect two smaller nodes
together (e.g., P(CE) and P(DA) had both
Probabilities that were smaller than P(B),
Hence those two did connect first
P(CEDA) = 0.49 P(B) = 0.51
P(CE) = 0.20
P(DA) = 0.29
P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16
CS 414 - Spring 2011
Huffman Encoding (Example)
Step 3: label left branches of the tree
P(CEDAB) = 1
With 0 and right branches of the tree
With 1
0 1
P(CEDA) = 0.49 P(B) = 0.51
0 1
P(CE) = 0.20
P(DA) = 0.29
0 1
0 1
P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16
CS 414 - Spring 2011
Huffman Encoding (Example)
Step 4: Create Huffman Code
P(CEDAB) = 1
Symbol A = 011
Symbol B = 1
0 1
Symbol C = 000
Symbol D = 010
Symbol E = 001
P(CEDA) = 0.49 P(B) = 0.51
0 1
P(CE) = 0.20
P(DA) = 0.29
0 1
0 1
P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16
CS 414 - Spring 2011
Huffman Decoding
• Assume Huffman Table
• Symbol Code
X 0
Y 10
Z 11
Consider encoded
bitstream:
000101011001110
What is the decoded string?
CS 414 - Spring 2011
Huffman Example
• Construct the Huffman Symbol (S) P(S)
coding tree (in class)
A 0.25
B 0.30
C 0.12
D 0.15
E 0.18
Characteristics of Solution
Symbol (S) Code
A 01
B 11
C 100
D 101
E 00
CS 414 - Spring 2011
Example Encoding/Decoding
Encode “BEAD” Symbol (S) Code
110001101
A 01
B 11
Decode “0101100” C 100
D 101
E 00
CS 414 - Spring 2011
Entropy (Theoretical Limit)
N
H p( si ) log 2 p( si ) Symbol P(S) Code
i 1 A 0.25 01
= -.25 * log2 .25 + B 0.30 11
-.30 * log2 .30 + C 0.12 100
-.12 * log2 .12 +
-.15 * log2 .15 + D 0.15 101
-.18 * log2 .18 E 0.18 00
H = 2.24 bits
Average Codeword Length
N
L p( si )codelength( si ) Symbol P(S) Code
i 1 A 0.25 01
= .25(2) + B 0.30 11
.30(2) +
C 0.12 100
.12(3) +
.15(3) + D 0.15 101
.18(2) E 0.18 00
L = 2.27 bits
Code Length Relative to Entropy
N N
L p( si )codelength( si ) H p( si ) log 2 p( si )
i 1 i 1
• Huffman reaches entropy limit when all
probabilities are negative powers of 2
– i.e., 1/2; 1/4; 1/8; 1/16; etc.
• H <= Code Length <= H + 1
CS 414 - Spring 2011
Example
H = -.01*log2.01 + Symbol P(S) Code
-.99*log2.99 A 0.01 1
= .08 B 0.99 0
L = .01(1) +
.99(1)
=1
CS 414 - Spring 2011
Group Exercise
• Compute Entropy (H) Symbol (S) P(S)
A 0.1
• Build Huffman tree B 0.2
C 0.4
• Compute average
code length D 0.2
E 0.1
• Code “BCCADE”
CS 414 - Spring 2011
Limitations
• Diverges from lower limit when probability of
a particular symbol becomes high
– always uses an integral number of bits
• Must send code book with the data
– lowers overall efficiency
• Must determine frequency distribution
– must remain stable over the data set
CS 414 - Spring 2011
Arithmetic Coding
• Optimal algorithm as Huffman coding wrt
compression ratio
• Better algorithm than Huffman wrt
transmitted amount of information
– Huffman – needs to transmit Huffman tables with
compressed data
– Arithmetic – needs to transmit length of encoded
string with compressed data
CS 414 - Spring 2011
Arithmetic Coding
• Each symbol is coded by considering the prior data
• Encoded data must be read from the beginning,
there is no random access possible
• Each real number (< 1) is represented as binary
fraction
– 0.5 = 2-1 (binary fraction = 0.1); 0.25 = 2-2 (binary fraction =
0.01), 0.625 = 0.5+0.125 (binary fraction = 0.101) ….
CS 414 - Spring 2011
CS 414 - Spring 2011
CS 414 - Spring 2011
Other Compression Steps
• Picture processing (Source Coding)
– Transformation from time to frequency domain
(e.g., use Discrete Cosine Transform)
– Motion vector computation in video
• Quantization
– Reduction of precision, e.g., cut least significant
bits
– Quantization matrix, quantization values
• Entropy Coding
– Huffman Coding + RLE
CS 414 - Spring 2011
Audio Compression and Formats (Hybrid Coding
Schemes)
• MPEG-3
• ADPCM
• u-Law
• Real Audio
• Windows Media (.wma)
• Sun (.au)
• Apple (.aif)
• Microsoft (.wav)
CS 414 - Spring 2011
Image Compression and Formats
• RLE
• Huffman
• LZW
• GIF
• JPEG / JPEG-2000 (Hybrid Coding)
• Fractals
• TIFF, PICT, BMP, etc.
CS 414 - Spring 2011
Video Compression and Formats (Hybrid Coding
Schemes)
• H.261/H.263/H.264
• Cinepak (early 1992 Apple’s video codec in Quick-time
video suite)
• Sorensen (Sorenson Media, used in Quick-time and
Macromedia flash)
• Indeo (early 1992 Intel video codec)
• Real Video (1997 RealNetworks)
• MPEG-1, MPEG-2, MPEG-4, etc.
• QuickTime, AVI, WMV (Windows Media Video)
CS 414 - Spring 2011
Summary
• Important Lossless (Entropy) Coders
– RLE, Huffman Coding and Arithmetic Coding
• Important Lossy (Source ) Coders
– Quantization
• Differential PCM (DPCM) – calculate difference from
previous values – one has fewer values to encode
• Loss occurs during quantization of sample values,
hence loss on difference precision occurs as well
Solution
• Compute Entropy (H) Symbol P(S) Code
– H = 2.1 bits A 0.1 100
• Build Huffman tree B 0.2 110
C 0.4 0
• Compute code length
– L = 2.2 bits D 0.2 111
E 0.1 101
• Code “BCCADE” => 11000100111101