Top 23 Python Arxiv Projects

ChatPaper

1 2 19,052 7.3 Python

Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
Stream

getstream.io featured

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
arxiv-latex-cleaner

2 5 6,619 2.8 Python

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Project mention: LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives | news.ycombinator.com | 2025-10-13

I agree with other comments that this research treads a fine, unethical line. Did the authors responsibly disclose this, as is often done in the security research community? I cannot find any mention of it in the paper. The researchers seem to be involved in security-related research (first author is doing a PhD, last author holds a PhD).
At least arxiv could have run the cleaner [1] before the print of this pre-print (lol). If there was no disclosure, then I think this pre-print becomes unethical to put up.
> leading to the identification of nearly 1,200 images containing sensitive metadata. The types of data represented vary significantly. While device information (e.g., the camera used) or software details (such as the exact version of Photoshop) may already raise concerns, in over 600 cases the metadata contained GPS coordinates, potentially revealing the precise location where a photo was taken. In some instances, this could expose a researcher’s home address (when tied to a profile picture) or the location of research facilities (when images capture experimental equipment)
Oof, that's not too great.
[1] https://github.com/google-research/arxiv-latex-cleaner
arxiv-vanity

3 5 1,631 0.0 Python

Renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.
arxiv-sanity-lite

4 19 1,426 0.0 Python

arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.

Project mention: My Struggle with Doom Scrolling | news.ycombinator.com | 2025-01-22
arxiv.py

5 2 1,414 4.6 Python

Python wrapper for the arXiv API
resp

6 4 459 4.8 Python

Fetch Academic Research Papers from different sources
ArxivDigest

7 2 384 0.6 Python

ArXiv Digest and Personalized Recommendations using Large Language Models
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
paper2remarkable

8 4 371 7.8 Python

Fetch an academic paper or web article and send it to the reMarkable tablet with a single command
summarizepaper

9 2 303 7.4 Python

An AI-powered arXiv paper summarization website with a virtual assistant for answering questions.
findpapers

10 1 293 4.0 Python

Findpapers: A tool for helping researchers who are looking for related works
bibcure

11 1 204 10.0 Python

Bibcure helps in boring tasks by keeping your bibfile up to date and normalized...also allows you to easily download all papers inside your bibtex
searchthearxiv

12 2 164 4.2 Python

The code powering searchthearxiv.com, a simple semantic search engine for more than 300,000 ML papers on arXiv.

Project mention: Semantic search engine for ArXiv, biorxiv and medrxiv | news.ycombinator.com | 2025-05-20
pdf2doi

13 2 130 6.7 Python

A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.
cobib

14 1 65 9.4 Python

Console Bibliography
Auto-Research

15 1 58 0.0 Python

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
ocr-benchmark

16 2 45 6.6 Python

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

Project mention: Gemini beats everyone on new OCR benchmark | news.ycombinator.com | 2025-02-14

If you're wondering how they prompt the models:
"Perform OCR on this image. Return only the text found in the image as a single continuous string without any newlines, additional text, or commentary. Separate words with single spaces. For any truncated, partially visible, or occluded text, include only the visible portions without attempting to complete or guess the full text. If no text is present, return empty double quotes."
Found in: https://github.com/video-db/ocr-benchmark/blob/main/prompts....
Muzero-unplugged

17 3 34 2.1 Python

Pytorch Implementation of MuZero Unplugged for gym environment. This algorithm is capable of supporting a wide range of action and observation spaces, including both discrete and continuous variations.
ailert

18 2 28 7.0 Python

An open-source platform that aggregates AI content from 230+ sources including research papers, GitHub trends, and industry news, making AI knowledge accessible to everyone.

Project mention: Building an Open-Source AI Newsletter Engine: The Story of AiLert | dev.to | 2025-01-12

Code: https://github.com/anuj0456/ailert Docs: https://github.com/anuj0456/ailert/blob/main/README.md
Paper-Recommendation-System

19 2 22 10.0 Python

Web interface to search ArXiv papers using NLP Sentence-Transformers, Faiss and Streamlit
Muzero

20 1 18 10.0 Python

Pytorch Implementation of MuZero for gym environment. It support any Discrete , Box and Box2D configuration for the action space and observation space. (by DHDev0)
arxiv-to-prompt

21 1 16 6.8 Python

Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper.

Project mention: Show HN: iOS app (and CLI) for turning ArXiv papers into LLM-ready LaTeX prompts | news.ycombinator.com | 2025-08-16

https://github.com/takashiishida/arxiv-to-prompt
I’ve been using the CLI tool daily to help me quickly understand arXiv papers. This downloads the arXiv source files, finds the main `\documentclass` file, and flattens everything into one coherent LaTeX source (people usually have multiple .tex files in a single paper by using `\input` and `\include`). It also has options to remove comments and appendices to shorten prompts.
I often used the CLI tool on my laptop, but since I commute by train in Tokyo I wanted somthing I could use on my phone. That's why I built the iOS app.
With equation-heavy papers, it may be better to provide the precise latex notation instead of providing the PDF. Uploading PDFs is also more difficult and time consuming (especially on the phone).
Thanks for reading! This is my first iOS app, and I’d appreciate your thoughts!
arxiv-mcp-server

22 1 12 3.8 Python

MCP server for arXiv.org - Search, analyze, and export academic papers with AI assistants. Features advanced paper discovery, citation analysis, trend tracking, and multi-format exports.

Project mention: ArXiv-Mcp-Server | news.ycombinator.com | 2025-08-03
neozot-py

23 1 8 10.0 Python

get recommendations from arxiv based on your zotero library
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Arxiv discussion

Python Arxiv related posts

LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives

1 project | news.ycombinator.com | 13 Oct 2025
ArXiv LaTeX Cleaner: Clean the LaTeX code of your paper to submit to ArXiv

3 projects | news.ycombinator.com | 31 Jan 2025
My Struggle with Doom Scrolling

1 project | news.ycombinator.com | 22 Jan 2025
Hardware Acceleration of LLMs: A comprehensive survey and comparison

1 project | news.ycombinator.com | 6 Sep 2024
Show HN: FileKitty – Combine and label text files for LLM prompt contexts

5 projects | news.ycombinator.com | 1 May 2024
Show HN: Command Line Data Aggregation Tool for LLM Ingestion

1 project | news.ycombinator.com | 4 Apr 2024
Show HN: Talk to any ArXiv paper just by changing the URL

5 projects | news.ycombinator.com | 20 Dec 2023
A note from our sponsor - SaaSHub
www.saashub.com | 23 Dec 2025

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Arxiv projects in Python? This list will help you:

#	Project	Stars
1	ChatPaper	19,052
2	arxiv-latex-cleaner	6,619
3	arxiv-vanity	1,631
4	arxiv-sanity-lite	1,426
5	arxiv.py	1,414
6	resp	459
7	ArxivDigest	384
8	paper2remarkable	371
9	summarizepaper	303
10	findpapers	293
11	bibcure	204
12	searchthearxiv	164
13	pdf2doi	130
14	cobib	65
15	Auto-Research	58
16	ocr-benchmark	45
17	Muzero-unplugged	34
18	ailert	28
19	Paper-Recommendation-System	22
20	Muzero	18
21	arxiv-to-prompt	16
22	arxiv-mcp-server	12
23	neozot-py	8

Python Arxiv

Top 23 Python Arxiv Projects

Python Arxiv discussion

Python Arxiv related posts

LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives

ArXiv LaTeX Cleaner: Clean the LaTeX code of your paper to submit to ArXiv

My Struggle with Doom Scrolling

Hardware Acceleration of LLMs: A comprehensive survey and comparison

Show HN: FileKitty – Combine and label text files for LLM prompt contexts

Show HN: Command Line Data Aggregation Tool for LLM Ingestion

Show HN: Talk to any ArXiv paper just by changing the URL

Index

Did you know that Python is the 2nd most popular programming language based on number of references?

Did you know that Python is
the 2nd most popular programming language
based on number of references?