CommonForms

🪄 Automatically convert a PDF into a fillable form.

💻 Hosted Models (detect.semanticdocs.org) | 📄 CommonForms Paper | 🤗 Dataset | 🤗 FFDNet-L | 🤗 FFDNet-S

This repo contains three things:

the pip-installable commonforms package, which has a CLI and API for converting PDFs into fillable forms
the FFDNet-S and FFDNet-L models from the paper CommonForms: A Large, Diverse Dataset for Form Field Detection
the preprocessing code for the CommonForms dataset, which is hosted on HuggingFace: https://huggingface.co/datasets/jbarrow/CommonForms

Installation

CommonForms can be installed with either uv or pip, feel free to choose your package manager flavor:

uv pip install commonforms

Once it's installed, you should be able to run the CLI command on ~any PDF.

CommonForms CLI

The simplest usage will run inference on your CPU using the default suggested settings:

commonforms <input.pdf> <output.pdf>

Input	Output

Command Line Arguments

Argument	Type	Default	Description
`input`	Path	Required	Path to the input PDF file
`output`	Path	Required	Path to save the output PDF file
`--model`	str	`FFDNet-L`	Model name (FFDNet-L/FFDNet-S) or path to custom .pt file
`--keep-existing-fields`	flag	`False`	Keep existing form fields in the PDF
`--use-signature-fields`	flag	`False`	Use signature fields instead of text fields for detected signatures
`--device`	str	`cpu`	Device for inference (e.g., `cpu`, `cuda`, `0`)
`--image-size`	int	`1600`	Image size for inference
`--confidence`	float	`0.3`	Confidence threshold for detection
`--fast`	flag	`False`	If running on a CPU, you can trade off accuracy for speed and run in about half the time
`--multiline`	flag	`False`	If you want the detected textboxes to allow multiline inputs

CommonForms API

In addition to the CLI, you can use

from commonforms import prepare_form prepare_form( "path/to/input.pdf", "path/to/output.pdf" )

All of the above arguments are keyword arguments to the prepare_form function.

Dataset Prep

🚧 Code for dataset prep exists in the dataset folder.

Citation

If you use the tool, models, or code in an academic paper, please cite the CommonForms paper:

@misc{barrow2025commonforms, title = {CommonForms: A Large, Diverse Dataset for Form Field Detection}, author = {Barrow, Joe}, year = {2025}, eprint = {2509.16506}, archivePrefix= {arXiv}, primaryClass = {cs.CV}, doi = {10.48550/arXiv.2509.16506}, url = {https://arxiv.org/abs/2509.16506} }

If you use it in a non-academic setting, please reach out to the author (joseph.d.barrow [at] gmail.com)! I love to hear when people are using my work!

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
assets		assets
commonforms		commonforms
dataset		dataset
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CommonForms

Installation

CommonForms CLI

Command Line Arguments

CommonForms API

Dataset Prep

Citation

About

Uh oh!

Releases 3

Packages

Contributors 3

Languages

jbarrow/commonforms

Folders and files

Latest commit

History

Repository files navigation

CommonForms

Installation

CommonForms CLI

Command Line Arguments

CommonForms API

Dataset Prep

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages