Skip to content

A powerful subtitle file converter that ensures proper UTF-8 encoding. Supports multiple subtitle formats with a simple command-line interface and flexible configuration.

Notifications You must be signed in to change notification settings

onyxdevs/subzilla

Repository files navigation

SubZilla 🦎

A powerful subtitle file converter that ensures proper UTF-8 encoding with robust support for Arabic and other languages. SubZilla automatically detects the input file encoding and converts it to UTF-8, making it perfect for fixing subtitle encoding issues. Built with SOLID, YAGNI, KISS, and DRY principles in mind.

Features ✨

  • Automatic encoding detection.
  • Converts subtitle files to UTF-8.
  • Supports multiple subtitle formats (.srt, .sub, .txt).
  • Strong support for Arabic and other non-Latin scripts.
  • Simple command-line interface.
  • Batch processing with glob pattern support.
  • Parallel processing for better performance.
  • Preserves original file formatting.
  • Creates backup of original files.

Installation πŸš€

Prerequisites

  • Node.js (v14 or higher)
  • Yarn package manager

Global Installation

# Install globally using yarn yarn global add subzilla # Or using npm npm install -g subzilla

Local Development Setup

# Clone the repository git clone https://github.com/onyxdevs/subzilla.git cd subzilla # Install dependencies (installs all workspace packages) yarn install # Build all packages yarn build # Run the CLI yarn start # Development mode (watch for changes) yarn dev

Usage πŸ’»

Basic Usage

# Convert a single subtitle file subzilla convert path/to/subtitle.srt # The converted file will be saved as path/to/subtitle.utf8.srt # Strip HTML formatting subzilla convert input.srt --strip-html # Strip color codes subzilla convert input.srt --strip-colors # Strip style tags subzilla convert input.srt --strip-styles # Replace URLs with [URL] subzilla convert input.srt --strip-urls # Strip all formatting subzilla convert input.srt --strip-all # Create backup and strip formatting subzilla convert input.srt -b --strip-all # Create numbered backups instead of overwriting existing backup subzilla convert input.srt -b --no-overwrite-backup # Combine multiple strip options subzilla convert input.srt --strip-html --strip-colors

Batch Processing

Convert multiple subtitle files at once using glob patterns:

# Convert all .srt files in current directory subzilla batch "*.srt" # Convert files recursively in all subdirectories subzilla batch "**/*.srt" -r # Convert multiple formats subzilla batch "**/*.{srt,sub,txt}" -r # Specify output directory subzilla batch "**/*.srt" -o converted/ # Process files in parallel for better performance subzilla batch "**/*.srt" -p # Skip existing UTF-8 files subzilla batch "**/*.srt" -s # Combine basic options for maximum efficiency subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ # Advanced Directory Processing # Limit recursive depth to 2 levels subzilla batch "**/*.srt" -r -d 2 # Only process files in specific directories subzilla batch "**/*.srt" -r -i "movies" "series" # Exclude specific directories subzilla batch "**/*.srt" -r -x "temp" "backup" # Preserve directory structure in output subzilla batch "**/*.srt" -r -o converted/ --preserve-structure # Complex example combining all features subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ \ -d 3 -i "movies" "series" -x "temp" "backup" --preserve-structure # Strip formatting in batch mode subzilla batch "**/*.srt" -r --strip-all # Strip specific formatting in batch mode subzilla batch "**/*.srt" -r --strip-html --strip-colors # Create backups and strip formatting subzilla batch "**/*.srt" -r -b --strip-all # Create numbered backups instead of overwriting existing ones subzilla batch "**/*.srt" -r -b --no-overwrite-backup --strip-all # Complex example with formatting options subzilla batch "**/*.{srt,sub,txt}" -r -p -s -o converted/ \ -d 3 -i "movies" "series" -x "temp" "backup" \ --preserve-structure --strip-all -b

Options:

  • -o, --output-dir <dir>: Save converted files to specified directory.
  • -r, --recursive: Search for files in subdirectories.
  • -p, --parallel: Process files in parallel (faster for many files).
  • -s, --skip-existing: Skip files that already have a UTF-8 version.
  • -d, --max-depth <depth>: Maximum directory depth for recursive search.
  • -i, --include-dirs <dirs...>: Only process files in these directories.
  • -x, --exclude-dirs <dirs...>: Exclude files in these directories.
  • --preserve-structure: Preserve directory structure in output.
  • -b, --backup: Create backup of original files.
  • --no-overwrite-backup: Create numbered backups instead of overwriting existing backup.
  • --strip-html: Strip HTML tags.
  • --strip-colors: Strip color codes.
  • --strip-styles: Strip style tags.
  • --strip-urls: Replace URLs with [URL].
  • --strip-all: Strip all formatting (equivalent to all strip options).

Features:

  • Progress bar showing conversion status.
  • Per-directory progress tracking.
  • Detailed statistics after completion.
  • Error tracking and reporting.
  • Parallel processing support.
  • Skip existing files option.
  • Time tracking and performance metrics.
  • Directory structure preservation.
  • Directory filtering and depth control.
  • HTML tag stripping.
  • Color code removal.
  • Style tag removal.
  • URL replacement.
  • Whitespace normalization.
  • Original file backup.

Example Output:

πŸ” Found 25 files in 5 directories... Converting |==========| 100% | 25/25 | Total Progress Converting |==========| 100% | 8/8 | Processing movies Converting |==========| 100% | 7/7 | Processing series/season1 Converting |==========| 100% | 5/5 | Processing series/season2 Converting |==========| 100% | 3/3 | Processing series/specials Converting |==========| 100% | 2/2 | Processing extras πŸ“Š Batch Processing Summary: ━━━━━━━━━━━━━━━━━━━━━━━━━━ Total files processed: 25 Directories processed: 5 βœ… Successfully converted: 23 ❌ Failed: 1 ⏭️ Skipped: 1 ⏱️ Total time: 5.32s ⚑ Average time per file: 0.22s πŸ“‚ Directory Statistics: ━━━━━━━━━━━━━━━━━━━━ movies: Total: 8 βœ… Success: 8 ❌ Failed: 0 ⏭️ Skipped: 0 series/season1: Total: 7 βœ… Success: 6 ❌ Failed: 1 ⏭️ Skipped: 0 series/season2: Total: 5 βœ… Success: 5 ❌ Failed: 0 ⏭️ Skipped: 0 series/specials: Total: 3 βœ… Success: 2 ❌ Failed: 0 ⏭️ Skipped: 1 extras: Total: 2 βœ… Success: 2 ❌ Failed: 0 ⏭️ Skipped: 0 ❌ Errors: ━━━━━━━━━ series/season1/broken.srt: Failed to detect encoding 

Backup Management

SubZilla provides flexible backup options to protect your original files:

# Basic backup creation subzilla convert input.srt -b # By default, subsequent runs overwrite the existing backup # First run: creates input.srt.bak # Second run: overwrites input.srt.bak (clean, no accumulation) # Create numbered backups instead (legacy behavior) subzilla convert input.srt -b --no-overwrite-backup # First run: creates input.srt.bak # Second run: creates input.srt.bak.1 # Third run: creates input.srt.bak.2 # Configure backup behavior in config file # .subzillarc: # output: # createBackup: true # overwriteBackup: false # Creates numbered backups

Backup Behavior Summary:

  • overwriteBackup: true (default): Clean backup management - always overwrites existing backup
  • overwriteBackup: false: Legacy behavior - creates numbered backups (.bak.1, .bak.2, etc.)
  • CLI override: Use --no-overwrite-backup to temporarily disable backup overwriting

Advanced Options

# Specify output file (single file conversion) subzilla convert input.srt -o output.srt # Get help subzilla --help # Get version subzilla --version # Get help for specific command subzilla convert --help subzilla batch --help

Configuration πŸ”§

SubZilla supports flexible configuration through YAML files and environment variables. All settings are optional with sensible defaults.

Configuration Files

SubZilla looks for configuration files in the following order:

  1. Path specified via --config option
  2. .subzillarc in the current directory
  3. .subzilla.yml or .subzilla.yaml
  4. subzilla.config.yml or subzilla.config.yaml

Example Configurations

Several example configurations are provided in the examples/config directory:

  1. Full Configuration (.subzillarc):

    input: encoding: auto # auto, utf8, utf16le, utf16be, ascii, windows1256 format: auto # auto, srt, sub, ass, ssa, txt output: directory: ./converted # Output directory path createBackup: true # Create backup of original files overwriteBackup: true # Overwrite existing backup files (default: true) format: srt # Output format encoding: utf8 # Always UTF-8 bom: false # Add BOM to output files lineEndings: lf # lf, crlf, or auto # ... and more settings
  2. Minimal Configuration (minimal.subzillarc):

    input: encoding: auto format: auto output: directory: ./converted createBackup: true overwriteBackup: true # Overwrite existing backup files format: srt strip: html: true colors: true styles: true batch: recursive: true parallel: true skipExisting: true preserveStructure: true # Maintain directory structure chunkSize: 5
  3. Performance-Optimized (performance.subzillarc):

    output: createBackup: false # Skip backups overwriteBackup: true # When backups are created, overwrite existing ones overwriteInput: true # Overwrite input files overwriteExisting: true # Don't check existing files batch: parallel: true preserveStructure: false # Flat output structure chunkSize: 20 # Larger chunks retryCount: 0 # No retries failFast: true # Stop on first error
  4. Arabic-Optimized (arabic.subzillarc):

    input: encoding: windows1256 # Common Arabic encoding output: bom: true # Add BOM for compatibility lineEndings: crlf # Windows line endings batch: includeDirectories: - arabic - Ω…Ψ³Ω„Ψ³Ω„Ψ§Ψͺ - أفلام

Environment Variables

You can also configure SubZilla using environment variables. Copy .env.example to .env and modify as needed:

# Input Settings SUBZILLA_INPUT_ENCODING=utf8 SUBZILLA_INPUT_FORMAT=srt SUBZILLA_INPUT_DEFAULT_LANGUAGE=ar # Output Settings SUBZILLA_OUTPUT_DIRECTORY=./output SUBZILLA_OUTPUT_CREATE_BACKUP=true # Complex settings use JSON SUBZILLA_STRIP='{"html":true,"colors":true,"styles":true}' SUBZILLA_BATCH_INCLUDE_DIRECTORIES='["movies","series"]'

Configuration Priority

Settings are merged in the following order (later ones override earlier ones):

  1. Default values.
  2. Configuration file.
  3. Environment variables.
  4. Command-line arguments.

Available Options

Input Options

  • encoding: Input file encoding (auto, utf8, utf16le, utf16be, ascii, windows1256).
  • format: Input format (auto, srt, sub, ass, ssa, txt).

Output Options

  • directory: Output directory path.
  • createBackup: Create backup of original files.
  • overwriteBackup: Overwrite existing backup files (default: true).
  • format: Output format.
  • encoding: Output encoding (always utf8).
  • bom: Add BOM to output files.
  • lineEndings: Line ending style (lf, crlf, auto).
  • overwriteInput: Overwrite input files.
  • overwriteExisting: Overwrite existing files.

Strip Options

  • html: Remove HTML tags.
  • colors: Remove color codes.
  • styles: Remove style tags.
  • urls: Replace URLs with [URL].
  • timestamps: Replace timestamps with [TIMESTAMP].
  • numbers: Replace numbers with #.
  • punctuation: Remove punctuation.
  • emojis: Replace emojis with [EMOJI].
  • brackets: Remove brackets.

Batch Options

  • recursive: Process subdirectories.
  • parallel: Process files in parallel.
  • skipExisting: Skip existing UTF-8 files.
  • maxDepth: Maximum directory depth.
  • includeDirectories: Only process these directories.
  • excludeDirectories: Skip these directories.
  • preserveStructure: Maintain directory structure.
  • chunkSize: Files per batch.
  • retryCount: Number of retry attempts.
  • retryDelay: Delay between retries (ms).
  • failFast: Stop on first error.

Architecture πŸ—οΈ

SubZilla follows a modular monorepo architecture with clear separation of concerns:

Package Dependencies

@subzilla/cli β”œβ”€β”€ @subzilla/core β”‚ └── @subzilla/types └── @subzilla/types 
  • @subzilla/types: Foundation package with no dependencies
  • @subzilla/core: Depends on types, provides core functionality
  • @subzilla/cli: Depends on both core and types, provides user interface

Key Design Principles

  • SOLID Principles: Single responsibility, open/closed, Liskov substitution, interface segregation, dependency inversion
  • YAGNI: You Aren't Gonna Need It - avoid over-engineering
  • KISS: Keep It Simple, Stupid - prioritize simplicity and clarity
  • DRY: Don't Repeat Yourself - shared code in appropriate packages

TypeScript Project References

The monorepo uses TypeScript project references for:

  • Faster incremental builds
  • Better IDE support
  • Proper dependency tracking
  • Type-safe cross-package imports

Development πŸ› οΈ

Project Structure

SubZilla is organized as a Yarn Workspaces monorepo with three main packages:

subzilla/ β”œβ”€β”€ packages/ β”‚ β”œβ”€β”€ cli/ # @subzilla/cli - Command-line interface β”‚ β”‚ β”œβ”€β”€ src/ β”‚ β”‚ β”‚ β”œβ”€β”€ commands/ # CLI command implementations β”‚ β”‚ β”‚ β”œβ”€β”€ constants/# Shared CLI options β”‚ β”‚ β”‚ β”œβ”€β”€ registry/ # Command registration system β”‚ β”‚ β”‚ β”œβ”€β”€ utils/ # CLI utilities β”‚ β”‚ β”‚ └── main.ts # CLI entry point β”‚ β”‚ └── package.json β”‚ β”œβ”€β”€ core/ # @subzilla/core - Core processing logic β”‚ β”‚ β”œβ”€β”€ src/ β”‚ β”‚ β”‚ β”œβ”€β”€ utils/ # Output strategies β”‚ β”‚ β”‚ β”œβ”€β”€ *.ts # Core services and processors β”‚ β”‚ β”‚ └── index.ts # Package exports β”‚ β”‚ └── package.json β”‚ └── types/ # @subzilla/types - TypeScript definitions β”‚ β”œβ”€β”€ src/ β”‚ β”‚ β”œβ”€β”€ cli/ # CLI-related types β”‚ β”‚ β”œβ”€β”€ core/ # Core functionality types β”‚ β”‚ β”œβ”€β”€ index.ts # Main exports β”‚ β”‚ └── validation.ts # Zod schemas β”‚ └── package.json β”œβ”€β”€ examples/ # Configuration examples β”œβ”€β”€ package.json # Workspace root configuration └── tsconfig.json # TypeScript project references 

Package Documentation

Each package has comprehensive documentation:

Testing πŸ§ͺ

SubZilla includes a comprehensive Jest testing framework with 83 passing tests across all packages:

# Run all tests yarn test # Test specific package yarn workspace @subzilla/core test yarn workspace @subzilla/cli test yarn workspace @subzilla/types test

Test Coverage:

  • @subzilla/types (13 tests): Zod schema validation, configuration validation
  • @subzilla/core (57 tests): Encoding detection/conversion, formatting stripping, end-to-end processing
  • @subzilla/cli (13 tests): Command registration, CLI parsing, error handling

Key Features:

  • Multi-project Jest setup with TypeScript support
  • Real file system testing with temporary directories
  • CLI integration tests using execSync
  • Proper TypeScript mocking with generic type annotations
  • Arabic text encoding tests for Windows-1256 support
  • CI/CD integration with GitHub Actions

Available Scripts

Workspace-level scripts:

  • yarn build: Build all packages in dependency order
  • yarn start: Run the SubZilla CLI
  • yarn dev: Development mode with watch for all packages
  • yarn test: Run tests across all packages
  • yarn type-check: TypeScript type checking for all packages
  • yarn lint: Run linter across all packages
  • yarn lint:fix: Fix linting issues across all packages
  • yarn format: Format code using Prettier across all packages
  • yarn format:check: Check code formatting across all packages
  • yarn clean: Clean all build artifacts

Package-specific scripts:

# Build specific package yarn workspace @subzilla/core build # Run CLI directly yarn workspace @subzilla/cli start # Develop specific package yarn workspace @subzilla/types dev

Monorepo Benefits

The workspace structure provides several advantages:

  • Shared Dependencies: Common dependencies are hoisted to the root, reducing duplication
  • Type Safety: Cross-package imports are fully type-checked at compile time
  • Atomic Changes: Related changes across packages can be made in a single commit
  • Consistent Tooling: Shared linting, formatting, and build configurations
  • Simplified Development: Single yarn install and yarn build for the entire project

Contributing

  1. Fork the repository

  2. Clone your fork and install dependencies

    git clone https://github.com/your-username/subzilla.git cd subzilla yarn install
  3. Create your feature branch

    git checkout -b feature/amazing-feature
  4. Make your changes

    • Follow the existing code style and patterns
    • Add tests for new functionality
    • Update documentation as needed
    • Ensure all packages build successfully: yarn build
  5. Test your changes

    yarn build yarn test yarn lint yarn type-check
  6. Commit your changes

    git commit -m 'Add some amazing feature'
  7. Push to your branch

    git push origin feature/amazing-feature
  8. Open a Pull Request

Development Workflow

# Start development mode (watches all packages) yarn dev # Build specific package yarn workspace @subzilla/core build # Test specific package yarn workspace @subzilla/cli test # Run CLI during development yarn start --help # Clean and rebuild everything yarn clean yarn build

License πŸ“

This project is licensed under the ISC License - see the LICENSE file for details.

Support πŸ’ͺ

If you encounter any issues or have questions, please:

  1. Check the issues page
  2. Create a new issue if your problem isn't already listed
  3. Provide as much detail as possible, including:
    • SubZilla version
    • Node.js version
    • Operating system
    • Sample subtitle file (if possible)

Acknowledgments πŸ™

  • Thanks to all contributors.
  • Inspired by the need for better subtitle encoding support.
  • Built with TypeScript and Node.js.

Further Enhancements πŸš€

Planned improvements and feature additions:

  1. Enhanced Format Support

    • Add support for .ass and .ssa subtitle formats
    • Handle multiple subtitle files in batch
    • Support subtitle format conversion (SRT ↔ ASS ↔ SSA)
    • Add WebVTT format support
    • Support subtitle timing synchronization
  2. User Interface & Experience

    • Interactive CLI mode with comprehensive commands
    • Progress bars for batch operations
    • Create a web interface for browser-based conversion
    • Build a native macOS app using Electron
    • Add drag-and-drop GUI interface
    • Implement real-time encoding preview
  3. Performance & Reliability

    • Parallel processing for batch operations
    • Configurable chunk size for parallel processing
    • Retry mechanism for failed conversions
    • Batch processing progress tracking and statistics
    • Memory usage optimization for large files
    • Streaming processing for very large subtitle files
    • Performance benchmarking and profiling tools
    • Caching mechanism for repeated operations
  4. Advanced Features

    • Comprehensive subtitle validation with Zod schemas
    • Extensive formatting stripping (HTML, colors, styles, emojis)
    • Subtitle timing adjustment and synchronization
    • Subtitle merging and splitting
    • Character encoding preview and detection confidence
    • JSON/CSV export for batch processing results
    • AI-powered subtitle translation integration
    • Subtitle quality analysis and scoring
  5. Developer Experience & Infrastructure

    • Comprehensive test suite (83 tests across all packages)
    • TypeScript monorepo with project references
    • Detailed API documentation for all packages
    • Configuration examples and templates
    • GitHub Actions CI/CD workflow
    • Automated release management
    • Performance regression testing
    • Docker containerization
    • Plugin system for custom processors
    • Webhook integration for automated workflows
  6. Integration & Ecosystem

    • VS Code extension for subtitle editing
    • API server mode for remote processing
    • Integration with popular media players
    • Cloud storage integration (S3, Google Drive, Dropbox)
    • Batch processing via file watching
    • Integration with subtitle databases (OpenSubtitles, etc.)

Want to contribute to these enhancements? Check our Contributing section!

About

A powerful subtitle file converter that ensures proper UTF-8 encoding. Supports multiple subtitle formats with a simple command-line interface and flexible configuration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •