InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python Computer Vision Projects
- Project mention: Show HN: Real-time privacy protection for smart glasses | news.ycombinator.com | 2025-08-11
Did you look at egoblur? its a lot more effective at face detection than https://github.com/ageitgey/face_recognition granted, you'd have to do your own face matching to do exception.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
- Project mention: Why DETRs are replacing YOLOs for real-time object detection | news.ycombinator.com | 2025-11-22
> The YOLO series is developed and maintained by Ultralytics. All YOLO code and weights are released under the AGPL-3.0 license.The YOLO series is developed and maintained by Ultralytics. All YOLO code and weights are released under the AGPL-3.0 license.
The original author of YOLO and the Darknet framework [1] issued the code under pretty much every license you wish to use [2]. My preferred fork by AlexeyAB is under an equally permissive license [3].
Ultralytics then created their own model under the AGPL-3.0 license [4], which probably would never stand up in a court as they have the model from the likes of YOLOv3 in their source [5].
This entire article is flawed anyway, because they don't state which YOLOv11 model they are using or compare the accuracy. They appear to have just taken the pre-trained models and assumed it's apples-to-apples. They could have at least compared YOLO11n/s/m/l/x,
[1] https://pjreddie.com/darknet/yolo/
[2] https://github.com/pjreddie/darknet
[3] https://github.com/AlexeyAB/darknet
[4] https://github.com/ultralytics/ultralytics
[5] https://github.com/ultralytics/ultralytics/tree/main/ultraly...
- Project mention: Show HN: Plug-and-play Python utils for any computer-vision pipeline | news.ycombinator.com | 2025-07-21
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
-
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Hugging Face Datasets -- the library that lets you download and manage datasets from the Hugging Face Hub, as well as being a convenient vendor-neutral interface for your own datasets.
-
gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Project mention: Show HN: Blurry – Host, share, and embed Gaussian Splatting models | news.ycombinator.com | 2025-05-29Hey HN,
I noticed that there is a few tools to create and edit 3D Gaussian Splatting[1] models, but not so many tools for robust hosting, sharing, and presentation of these models, so I built Blurry!
Unlike other platforms, I’m specifically focusing on the best 3DGS in-browser viewer experience you can seamlessly embed on websites or Notion docs, with more places coming soon!
Some of the potential use cases are professional training, product and rental space marketing, and construction business. But at this stage with Blurry, I’m specifically targeting people/businesses that already use 3D scanning to a certain degree but are lacking an easy-to-use and performant hosting platform for their 3DGS models.
I’m shipping new features and improvements very quickly. Two big things on the current roadmap are first-person camera controls (especially for indoor splats), and a support for really large models (possibly done by streaming of the model as you move around).
Would love your thoughts and feedback pls!
[1] For those who don’t know, 3D Gaussian Splatting is a fairly new method to reconstruct a 3D model from real pictures and videos (https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/).
-
ConvNeXT's architecture contains an AdaptiveAvgPool2d layer: https://github.com/pytorch/vision/blob/5f03dc524bdb7529bb4f2...
This means that you can split your image into tiles, process each tile individually, average the results, apply a final classification layer to the average and get exactly the same result. For reference, see the demonstration below.
You could of course do exactly the same thing with a vision transformer instead of a convolutional neural network.
That being said, architecture is wildly overemphasized in my opinion. Data is everything.
import torch, torchvision.models -
labelme
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
Project mention: Convert LabelMe Annotations to YOLO Format with labelme-to-yolo | dev.to | 2024-12-29Convert LabelMe format into YoloV7 format for instance segmentation.
-
-
OpenCLIP: Open Source Implementation
-
-
pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
-
-
segmentation_models.pytorch
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
-
-
- Project mention: Launch HN: Enhanced Radar (YC W25) – A safety net for air traffic control | news.ycombinator.com | 2025-03-04
Are there already bird not a bird datasets?
Procedures for creating "bird on Multispectral plane radar and video" dataset(s):
Tag birds on the dashcam video with timecoded sensor data and a segmentation and annotation tool.
Pinch to zoom, auto-edge detect, classification probability, sensor status
voxel51/fiftyone does segmentation and annotation with video and possibly Multispectral data: https://github.com/voxel51/fiftyone
- Project mention: Gluon: a GPU programming language based on the same compiler stack as Triton | news.ycombinator.com | 2025-09-17
Amazon (+ Microsoft) already released a language for ML called gluon 8 years ago: https://aws.amazon.com/blogs/aws/introducing-gluon-a-new-lib...
autogluon is popular as well: https://github.com/autogluon/autogluon
-
U-2-Net
The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Computer Vision discussion
Python Computer Vision related posts
-
Show HN: Dinotool – a foundation model vector embedding CLI
-
Deep Dive: Building Real-Time Facial Emotion Detection on Raspberry Pi with YOLOv11
-
Segment Anything 3
-
Comic Translate: Unleash the Power of GPT-4 for Flawless Global Comic Translation
-
Show HN: Open-Source LaTeX OCR, Alternative to Mathpix/SimpleTex
-
Show HN: Automate Robot Data Quality Improvement
-
A toolkit for improving the quality of your LeRobot datasets
- A note from our sponsor - InfluxDB www.influxdata.com | 22 Dec 2025
Index
What are some of the best open-source Computer Vision projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Face Recognition | 55,756 |
| 2 | ultralytics | 50,002 |
| 3 | supervision | 36,171 |
| 4 | EasyOCR | 28,597 |
| 5 | d2l-en | 27,350 |
| 6 | pytorch-CycleGAN-and-pix2pix | 24,804 |
| 7 | vit-pytorch | 24,698 |
| 8 | datasets | 21,001 |
| 9 | gaussian-splatting | 19,908 |
| 10 | vision | 17,390 |
| 11 | labelme | 15,398 |
| 12 | facenet | 14,207 |
| 13 | open_clip | 13,140 |
| 14 | fashion-mnist | 12,555 |
| 15 | pytorch-grad-cam | 12,468 |
| 16 | ludwig | 11,636 |
| 17 | segmentation_models.pytorch | 11,182 |
| 18 | nerfstudio | 11,009 |
| 19 | Kornia | 10,938 |
| 20 | fiftyone | 10,160 |
| 21 | autogluon | 9,716 |
| 22 | U-2-Net | 9,572 |
| 23 | RobustVideoMatting | 9,112 |