Skip to content

Commit 2e77319

Browse files
authored
Add Ghostscript-based PDF compressor and update docs (fixes #129)
- Add pdf_compressor_ghostscript.py using open-source Ghostscript - Update README.md with both legacy and recommended methods - Update requirements.txt to note system dependencies - Fixes issue #129: PDFTron/PDFNet is now commercial and requires license - Provides free alternative with same functionality and API
1 parent 8fca152 commit 2e77319

File tree

3 files changed

+157
-8
lines changed

3 files changed

+157
-8
lines changed
Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,48 @@
11
# [How to Compress PDF Files in Python](https://www.thepythoncode.com/article/compress-pdf-files-in-python)
2-
To run this:
3-
- `pip3 install -r requirements.txt`
4-
- To compress `bert-paper.pdf` file:
5-
```
6-
$ python pdf_compressor.py bert-paper.pdf bert-paper-min.pdf
7-
```
8-
This will spawn a new compressed PDF file under the name `bert-paper-min.pdf`.
2+
3+
This directory contains two approaches:
4+
5+
- Legacy (commercial): `pdf_compressor.py` uses PDFTron/PDFNet. PDFNet now requires a license key and the old pip package is not freely available, so this may not work without a license.
6+
- Recommended (open source): `pdf_compressor_ghostscript.py` uses Ghostscript to compress PDFs.
7+
8+
## Ghostscript method (recommended)
9+
10+
Prerequisite: Install Ghostscript
11+
12+
- macOS (Homebrew):
13+
- `brew install ghostscript`
14+
- Ubuntu/Debian:
15+
- `sudo apt-get update && sudo apt-get install -y ghostscript`
16+
- Windows:
17+
- Download and install from https://ghostscript.com/releases/
18+
- Ensure `gswin64c.exe` (or `gswin32c.exe`) is in your PATH.
19+
20+
No Python packages are required for this method, only Ghostscript.
21+
22+
### Usage
23+
24+
To compress `bert-paper.pdf` into `bert-paper-min.pdf` with default quality (`power=2`):
25+
26+
```
27+
python pdf_compressor_ghostscript.py bert-paper.pdf bert-paper-min.pdf
28+
```
29+
30+
Optional quality level `[power]` controls compression/quality tradeoff (maps to Ghostscript `-dPDFSETTINGS`):
31+
32+
- 0 = `/screen` (smallest, lowest quality)
33+
- 1 = `/ebook` (good quality)
34+
- 2 = `/printer` (high quality) [default]
35+
- 3 = `/prepress` (very high quality)
36+
- 4 = `/default` (Ghostscript default)
37+
38+
Example:
39+
40+
```
41+
python pdf_compressor_ghostscript.py bert-paper.pdf bert-paper-min.pdf 1
42+
```
43+
44+
In testing, `bert-paper.pdf` (~757 KB) compressed to ~407 KB with `power=1`.
45+
46+
## Legacy PDFNet method (requires license)
47+
48+
If you have a valid license and the PDFNet SDK installed, you can use the original `pdf_compressor.py` script. Note that the previously referenced `PDFNetPython3` pip package is not freely available and may not install via pip. Refer to the vendor's documentation for installation and licensing.
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
import os
2+
import sys
3+
import subprocess
4+
import shutil
5+
6+
7+
def get_size_format(b, factor=1024, suffix="B"):
8+
for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
9+
if b < factor:
10+
return f"{b:.2f}{unit}{suffix}"
11+
b /= factor
12+
return f"{b:.2f}Y{suffix}"
13+
14+
15+
def find_ghostscript_executable():
16+
candidates = [
17+
shutil.which('gs'),
18+
shutil.which('gswin64c'),
19+
shutil.which('gswin32c'),
20+
]
21+
for c in candidates:
22+
if c:
23+
return c
24+
return None
25+
26+
27+
def compress_file(input_file: str, output_file: str, power: int = 2):
28+
"""Compress PDF using Ghostscript.
29+
30+
power:
31+
0 -> /screen (lowest quality, highest compression)
32+
1 -> /ebook (good quality)
33+
2 -> /printer (high quality) [default]
34+
3 -> /prepress (very high quality)
35+
4 -> /default (Ghostscript default)
36+
"""
37+
if not os.path.exists(input_file):
38+
raise FileNotFoundError(f"Input file not found: {input_file}")
39+
if not output_file:
40+
output_file = input_file
41+
42+
initial_size = os.path.getsize(input_file)
43+
44+
gs = find_ghostscript_executable()
45+
if not gs:
46+
raise RuntimeError(
47+
"Ghostscript not found. Install it and ensure 'gs' (Linux/macOS) "
48+
"or 'gswin64c'/'gswin32c' (Windows) is in PATH."
49+
)
50+
51+
settings_map = {
52+
0: '/screen',
53+
1: '/ebook',
54+
2: '/printer',
55+
3: '/prepress',
56+
4: '/default',
57+
}
58+
pdfsettings = settings_map.get(power, '/printer')
59+
60+
cmd = [
61+
gs,
62+
'-sDEVICE=pdfwrite',
63+
'-dCompatibilityLevel=1.4',
64+
f'-dPDFSETTINGS={pdfsettings}',
65+
'-dNOPAUSE',
66+
'-dBATCH',
67+
'-dQUIET',
68+
f'-sOutputFile={output_file}',
69+
input_file,
70+
]
71+
72+
try:
73+
subprocess.run(cmd, check=True)
74+
except subprocess.CalledProcessError as e:
75+
print(f"Ghostscript failed: {e}")
76+
return False
77+
78+
compressed_size = os.path.getsize(output_file)
79+
ratio = 1 - (compressed_size / initial_size)
80+
summary = {
81+
"Input File": input_file,
82+
"Initial Size": get_size_format(initial_size),
83+
"Output File": output_file,
84+
"Compressed Size": get_size_format(compressed_size),
85+
"Compression Ratio": f"{ratio:.3%}",
86+
}
87+
88+
print("## Summary ########################################################")
89+
for k, v in summary.items():
90+
print(f"{k}: {v}")
91+
print("###################################################################")
92+
return True
93+
94+
95+
if __name__ == '__main__':
96+
if len(sys.argv) < 3:
97+
print("Usage: python pdf_compressor_ghostscript.py <input.pdf> <output.pdf> [power 0-4]")
98+
sys.exit(1)
99+
input_file = sys.argv[1]
100+
output_file = sys.argv[2]
101+
power = int(sys.argv[3]) if len(sys.argv) > 3 else 2
102+
ok = compress_file(input_file, output_file, power)
103+
sys.exit(0 if ok else 2)
Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
1-
PDFNetPython3==8.1.0
1+
# No Python dependencies required for Ghostscript-based compressor.
2+
# System dependency: Ghostscript
3+
# - macOS: brew install ghostscript
4+
# - Debian: sudo apt-get install -y ghostscript
5+
# - Windows: https://ghostscript.com/releases/
6+
#
7+
# The legacy script (pdf_compressor.py) depends on PDFNet (commercial) and a license key.

0 commit comments

Comments
 (0)