An educational compression experimentation toolkit built in Go. This CLI tool allows you to experiment with different compression algorithms, analyze their performance, and understand compression theory through hands-on examples.
-
🔧 Multiple Compression Algorithms:
- RLE (Run-Length Encoding) - Best for data with consecutive repeated characters
- Huffman Coding - Variable-length encoding based on character frequency
- LZ77 - Dictionary-based compression using sliding windows
- LZW (Lempel-Ziv-Welch) - Dictionary-based compression with dynamic dictionary
-
📊 Comprehensive Metrics:
- Compression ratio
- Space savings percentage
- Shannon entropy analysis
- Compression/decompression performance timing
-
📈 Visualization:
- ASCII bar charts for compression ratios
- Entropy visualization with interpretation
- Side-by-side algorithm comparison
- Color-coded output for clarity
-
🎯 Educational Focus:
- Clean, readable algorithm implementations
- Detailed metrics to understand compression behavior
- Support for comparing multiple algorithms
- Go 1.16 or higher
git clone https://github.com/BaseMax/go-compress-lab.git cd go-compress-lab go build -o compress-lab ./cmd/compress-labOptionally, install it globally:
go install ./cmd/compress-lab# Compress text with all algorithms (comparison mode) ./compress-lab -text="Hello World" -compare # Compress a file with a specific algorithm ./compress-lab -input=file.txt -algo=huffman # Compress and save to file ./compress-lab -input=file.txt -algo=lzw -output=compressed.bin # Decompress a file ./compress-lab -input=compressed.bin -algo=lzw -decompress -output=original.txt-text- Text string to compress (alternative to input file)-input- Input file path-output- Output file path for compressed/decompressed data-algo- Algorithm to use:rle,huffman,lz77,lzw, orall-compare- Compare all algorithms (shows detailed metrics)-decompress- Decompress mode instead of compress
./compress-lab -text="AAAABBBCCCCCCDDDDDD" -compareOutput:
Comparing all compression algorithms... Input size: 19 bytes ==================================================================================================== COMPRESSION RESULTS ==================================================================================================== Algorithm | Original | Compressed | Ratio | Savings % | Entropy | Comp Time | Decomp Time ---------------------------------------------------------------------------------------------------- RLE | 19 B | 8 B | 2.38 | 57.89% | 1.94 | 2.284µs | 360ns Huffman | 19 B | 23 B | 0.83 | -21.05% | 1.94 | 13.244µs | 8.937µs LZ77 | 19 B | 25 B | 0.76 | -31.58% | 1.94 | 1.503µs | 1.353µs LZW | 19 B | 22 B | 0.86 | -15.79% | 1.94 | 40.856µs | 36.188µs ==================================================================================================== COMPRESSION RATIO VISUALIZATION ------------------------------------------------------------ RLE | ████████████████████████████████████████ 2.38:1 Huffman | █████████████ 0.83:1 LZ77 | ████████████ 0.76:1 LZW | ██████████████ 0.86:1 ------------------------------------------------------------ ENTROPY ANALYSIS ------------------------------------------------------------ Shannon Entropy: 1.9440 bits/byte Randomness: 24.30% [▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] Interpretation: → Very low entropy: Highly repetitive data, excellent for compression ------------------------------------------------------------ Analysis: RLE performs best on this data because it efficiently encodes consecutive runs of identical characters.
./compress-lab -text="The quick brown fox jumps over the lazy dog." -compareAnalysis: LZW and Huffman typically perform better on natural language text due to repeated patterns and character frequency distribution.
# Compress a file ./compress-lab -input=document.txt -algo=lzw -output=document.lzw # Decompress it ./compress-lab -input=document.lzw -algo=lzw -decompress -output=restored.txt./compress-lab -input=data.txt -algo=huffmanThis shows detailed metrics for just the Huffman algorithm.
- Formula:
Original Size / Compressed Size - Interpretation: Higher is better. Ratio > 1 means compression, < 1 means expansion.
- Formula:
(1 - Compressed/Original) × 100% - Interpretation: Positive percentage = compression, negative = expansion.
- Range: 0 to 8 bits/byte
- Interpretation:
- 0-2: Very low entropy, highly compressible
- 2-4: Low entropy, good compression potential
- 4-6: Medium entropy, moderate compression
- 6-8: High entropy, difficult to compress
- Best for: Data with long runs of repeated characters
- How it works: Replaces sequences of repeated bytes with (count, byte) pairs
- Example: "AAAA" → (4, 'A')
- Best for: Data with non-uniform character distribution
- How it works: Assigns shorter codes to frequent characters
- Example: 'e' might be encoded as "01" while 'z' as "1101100"
- Best for: Data with repeated patterns at various distances
- How it works: Uses a sliding window to find and reference previous occurrences
- Example: References previous data with (offset, length, next_char) triplets
- Best for: Data with repeated substrings
- How it works: Builds a dictionary of patterns on-the-fly
- Example: Commonly used in GIF and TIFF formats
go-compress-lab/ ├── cmd/ │ └── compress-lab/ │ └── main.go # CLI application entry point ├── pkg/ │ ├── algorithms/ │ │ ├── compressor.go # Compressor interface │ │ ├── rle.go # RLE implementation │ │ ├── huffman.go # Huffman coding implementation │ │ ├── lz77.go # LZ77 implementation │ │ └── lzw.go # LZW implementation │ ├── metrics/ │ │ └── metrics.go # Compression metrics and calculations │ └── visualization/ │ └── display.go # Terminal output formatting ├── go.mod └── README.md - Understanding Compression Theory: Compare algorithms on different types of data to see which performs best
- Entropy Analysis: Learn about information theory and data randomness
- Performance Benchmarking: Measure compression speed vs. ratio tradeoffs
- Algorithm Behavior: Observe how each algorithm handles different data patterns
Contributions are welcome! This is an educational project, so please keep implementations clear and well-documented.
MIT License - See LICENSE file for details
Max Base
This project is designed for educational purposes to help understand compression algorithms and their behavior on different types of data.