Skip to content

An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples. An educational compression experimentation toolkit.

License

Notifications You must be signed in to change notification settings

BaseMax/go-compress-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go Compress Lab

An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples.

Features

  • 🔧 Multiple Compression Algorithms:

    • RLE (Run-Length Encoding) - Best for data with consecutive repeated characters
    • Huffman Coding - Variable-length encoding based on character frequency
    • LZ77 - Dictionary-based compression using sliding windows
    • LZW (Lempel-Ziv-Welch) - Dictionary-based compression with dynamic dictionary
  • 📊 Comprehensive Metrics:

    • Compression ratio
    • Space savings percentage
    • Shannon entropy analysis
    • Compression/decompression performance timing
  • 📈 Visualization:

    • ASCII bar charts for compression ratios
    • Entropy visualization with interpretation
    • Side-by-side algorithm comparison
    • Color-coded output for clarity
  • 🎯 Educational Focus:

    • Clean, readable algorithm implementations
    • Detailed metrics to understand compression behavior
    • Support for comparing multiple algorithms

Installation

Prerequisites

  • Go 1.16 or higher

Build from Source

git clone https://github.com/BaseMax/go-compress-lab.git cd go-compress-lab go build -o compress-lab ./cmd/compress-lab

Optionally, install it globally:

go install ./cmd/compress-lab

Usage

Basic Commands

# Compress text with all algorithms (comparison mode) ./compress-lab -text="Hello World" -compare # Compress a file with a specific algorithm ./compress-lab -input=file.txt -algo=huffman # Compress and save to file ./compress-lab -input=file.txt -algo=lzw -output=compressed.bin # Decompress a file ./compress-lab -input=compressed.bin -algo=lzw -decompress -output=original.txt

Command-Line Flags

  • -text - Text string to compress (alternative to input file)
  • -input - Input file path
  • -output - Output file path for compressed/decompressed data
  • -algo - Algorithm to use: rle, huffman, lz77, lzw, or all
  • -compare - Compare all algorithms (shows detailed metrics)
  • -decompress - Decompress mode instead of compress

Examples

Example 1: Comparing Algorithms on Repetitive Data

./compress-lab -text="AAAABBBCCCCCCDDDDDD" -compare

Output:

Comparing all compression algorithms... Input size: 19 bytes ==================================================================================================== COMPRESSION RESULTS ==================================================================================================== Algorithm | Original | Compressed | Ratio | Savings % | Entropy | Comp Time | Decomp Time ---------------------------------------------------------------------------------------------------- RLE | 19 B | 8 B | 2.38 | 57.89% | 1.94 | 2.284µs | 360ns Huffman | 19 B | 23 B | 0.83 | -21.05% | 1.94 | 13.244µs | 8.937µs LZ77 | 19 B | 25 B | 0.76 | -31.58% | 1.94 | 1.503µs | 1.353µs LZW | 19 B | 22 B | 0.86 | -15.79% | 1.94 | 40.856µs | 36.188µs ==================================================================================================== COMPRESSION RATIO VISUALIZATION ------------------------------------------------------------ RLE | ████████████████████████████████████████ 2.38:1 Huffman | █████████████ 0.83:1 LZ77 | ████████████ 0.76:1 LZW | ██████████████ 0.86:1 ------------------------------------------------------------ ENTROPY ANALYSIS ------------------------------------------------------------ Shannon Entropy: 1.9440 bits/byte Randomness: 24.30% [▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Interpretation: → Very low entropy: Highly repetitive data, excellent for compression ------------------------------------------------------------ 

Analysis: RLE performs best on this data because it efficiently encodes consecutive runs of identical characters.

Example 2: Natural Language Text

./compress-lab -text="The quick brown fox jumps over the lazy dog." -compare

Analysis: LZW and Huffman typically perform better on natural language text due to repeated patterns and character frequency distribution.

Example 3: File Compression and Decompression

# Compress a file ./compress-lab -input=document.txt -algo=lzw -output=document.lzw # Decompress it ./compress-lab -input=document.lzw -algo=lzw -decompress -output=restored.txt

Example 4: Single Algorithm Analysis

./compress-lab -input=data.txt -algo=huffman

This shows detailed metrics for just the Huffman algorithm.

Understanding the Metrics

Compression Ratio

  • Formula: Original Size / Compressed Size
  • Interpretation: Higher is better. Ratio > 1 means compression, < 1 means expansion.

Space Savings

  • Formula: (1 - Compressed/Original) × 100%
  • Interpretation: Positive percentage = compression, negative = expansion.

Shannon Entropy

  • Range: 0 to 8 bits/byte
  • Interpretation:
    • 0-2: Very low entropy, highly compressible
    • 2-4: Low entropy, good compression potential
    • 4-6: Medium entropy, moderate compression
    • 6-8: High entropy, difficult to compress

Algorithm Overview

RLE (Run-Length Encoding)

  • Best for: Data with long runs of repeated characters
  • How it works: Replaces sequences of repeated bytes with (count, byte) pairs
  • Example: "AAAA" → (4, 'A')

Huffman Coding

  • Best for: Data with non-uniform character distribution
  • How it works: Assigns shorter codes to frequent characters
  • Example: 'e' might be encoded as "01" while 'z' as "1101100"

LZ77

  • Best for: Data with repeated patterns at various distances
  • How it works: Uses a sliding window to find and reference previous occurrences
  • Example: References previous data with (offset, length, next_char) triplets

LZW

  • Best for: Data with repeated substrings
  • How it works: Builds a dictionary of patterns on-the-fly
  • Example: Commonly used in GIF and TIFF formats

Project Structure

go-compress-lab/ ├── cmd/ │ └── compress-lab/ │ └── main.go # CLI application entry point ├── pkg/ │ ├── algorithms/ │ │ ├── compressor.go # Compressor interface │ │ ├── rle.go # RLE implementation │ │ ├── huffman.go # Huffman coding implementation │ │ ├── lz77.go # LZ77 implementation │ │ └── lzw.go # LZW implementation │ ├── metrics/ │ │ └── metrics.go # Compression metrics and calculations │ └── visualization/ │ └── display.go # Terminal output formatting ├── go.mod └── README.md 

Educational Use Cases

  1. Understanding Compression Theory: Compare algorithms on different types of data to see which performs best
  2. Entropy Analysis: Learn about information theory and data randomness
  3. Performance Benchmarking: Measure compression speed vs. ratio tradeoffs
  4. Algorithm Behavior: Observe how each algorithm handles different data patterns

Contributing

Contributions are welcome! This is an educational project, so please keep implementations clear and well-documented.

License

MIT License - See LICENSE file for details

Author

Max Base

Acknowledgments

This project is designed for educational purposes to help understand compression algorithms and their behavior on different types of data.

About

An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples. An educational compression experimentation toolkit.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published