InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python image-classification Projects
- Project mention: Why DETRs are replacing YOLOs for real-time object detection | news.ycombinator.com | 2025-11-22
> The YOLO series is developed and maintained by Ultralytics. All YOLO code and weights are released under the AGPL-3.0 license.The YOLO series is developed and maintained by Ultralytics. All YOLO code and weights are released under the AGPL-3.0 license.
The original author of YOLO and the Darknet framework [1] issued the code under pretty much every license you wish to use [2]. My preferred fork by AlexeyAB is under an equally permissive license [3].
Ultralytics then created their own model under the AGPL-3.0 license [4], which probably would never stand up in a court as they have the model from the likes of YOLOv3 in their source [5].
This entire article is flawed anyway, because they don't state which YOLOv11 model they are using or compare the accuracy. They appear to have just taken the pre-trained models and assumed it's apples-to-apples. They could have at least compared YOLO11n/s/m/l/x,
[1] https://pjreddie.com/darknet/yolo/
[2] https://github.com/pjreddie/darknet
[3] https://github.com/AlexeyAB/darknet
[4] https://github.com/ultralytics/ultralytics
[5] https://github.com/ultralytics/ultralytics/tree/main/ultraly...
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
-
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
-
Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
-
pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
- Project mention: Launch HN: Enhanced Radar (YC W25) – A safety net for air traffic control | news.ycombinator.com | 2025-03-04
Are there already bird not a bird datasets?
Procedures for creating "bird on Multispectral plane radar and video" dataset(s):
Tag birds on the dashcam video with timecoded sensor data and a segmentation and annotation tool.
Pinch to zoom, auto-edge detect, classification probability, sensor status
voxel51/fiftyone does segmentation and annotation with video and possibly Multispectral data: https://github.com/voxel51/fiftyone
-
InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Project mention: InternVL3: Unified Multimodal AI Training Outperforms Open-Source Rivals | dev.to | 2025-04-17InternVL3 marks a significant advancement in the InternVL model series, implementing a native multimodal pre-training approach that fundamentally transforms how vision-language models learn. Unlike most leading multimodal large language models (MLLMs) that adapt text-only models to handle visual inputs through complex post-hoc alignment, InternVL3 jointly acquires multimodal and linguistic capabilities in a single unified pre-training stage.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
- Project mention: Gluon: a GPU programming language based on the same compiler stack as Triton | news.ycombinator.com | 2025-09-17
-
-
-
-
-
autodistill
Images to inference with no labeling (use foundation models to train supervised models).
We (Roboflow) have had early access to this model for the past few weeks. It's really, really good. This feels like a seminal moment for computer vision. I think there's a real possibility this launch goes down in history as "the GPT Moment" for vision.
The two areas I think this model is going to be transformative in the immediate term are for rapid prototyping and distillation.
Two years ago we released autodistill[1], an open source framework that uses large foundation models to create training data for training small realtime models. I'm convinced the idea was right, but too early; there wasn't a big model good enough to be worth distilling from back then. SAM3 is finally that model (and will be available in Autodistill today).
We are also taking a big bet on SAM3 and have built it into Roboflow as an integral part of the entire build and deploy pipeline[2], including a brand new product called Rapid[3], which reimagines the computer vision pipeline in a SAM3 world. It feels really magical to go from an unlabeled video to a fine-tuned realtime segmentation model with minimal human intervention in just a few minutes (and we rushed the release of our new SOTA realtime segmentation model[4] last week because it's the perfect lightweight complement to the large & powerful SAM3).
We also have a playground[5] up where you can play with the model and compare it to other VLMs.
[1] https://github.com/autodistill/autodistill
[2] https://blog.roboflow.com/sam3/
[3] https://rapid.roboflow.com
[4] https://github.com/roboflow/rf-detr
[5] https://playground.roboflow.com
-
-
-
fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
-
-
Unsupervised-Classification
SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]
-
-
-
involution
[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator
-
-
SimMIM
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python image-classification discussion
Python image-classification related posts
-
Show HN: Energy-Efficient NAS via RBF Kernel Scoring (No GPU Training Needed)
-
Show HN: Local, automatic, image keywords, captions using metadata for storage
-
Alternatives to Cosine Similarity
-
I made a social media app
-
Samsung expected to report 80% profit plunge as losses mount at chip business
-
Is it easier to go from Pytorch to TF and Keras than the other way around?
-
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
- A note from our sponsor - InfluxDB www.influxdata.com | 22 Dec 2025
Index
What are some of the best open-source image-classification projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | ultralytics | 50,002 |
| 2 | pytorch-image-models | 36,033 |
| 3 | vit-pytorch | 24,698 |
| 4 | Swin-Transformer | 15,313 |
| 5 | pytorch-grad-cam | 12,468 |
| 6 | fiftyone | 10,160 |
| 7 | InternVL | 9,625 |
| 8 | gluon-cv | 5,917 |
| 9 | PaddleClas | 5,768 |
| 10 | mmpretrain | 3,757 |
| 11 | hub | 3,523 |
| 12 | catalyst | 3,366 |
| 13 | autodistill | 2,549 |
| 14 | ailia-models | 2,296 |
| 15 | efficientnet | 2,097 |
| 16 | fastdup | 1,802 |
| 17 | pytorch-toolbelt | 1,562 |
| 18 | Unsupervised-Classification | 1,450 |
| 19 | poolformer | 1,356 |
| 20 | private-detector | 1,339 |
| 21 | involution | 1,314 |
| 22 | cleanvision | 1,134 |
| 23 | SimMIM | 995 |