Probably the best curated list of data science software in Python
- Contents
- Machine Learning
- Deep Learning
- Reinforcement Learning
- Graph Machine Learning
- Probabilistic Graphical Models
- Probabilistic Methods
- Data Manipulation
- Feature Engineering
- Visualization
- Deployment
- Model Explanation
- Genetic Programming
- Optimization
- Time Series
- Natural Language Processing
- Computer Audition
- Computer Vision
- Statistics
- Distributed Computing
- Experimentation
- Data Validation
- Evaluation
- Computations
- Web Scraping
- Spatial Analysis
- Quantum Computing
- Conversion
- Related Resources
- Contributing
- License
- scikit-learn - Machine learning in Python.  
- Shogun - Machine learning toolbox.
- xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
- cuML - RAPIDS Machine Learning Library.    
- modAL - Modular active learning framework for Python3.  
- Sparkit-learn - PySpark + scikit-learn = Sparkit-learn.    
- mlpack - A scalable C++ machine learning library (Python bindings).
- dlib - Toolkit for making real-world machine learning and data analysis applications in C++ (Python bindings).
- MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries.  
- hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels.    
- Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans.  
- scikit-multilearn - Multi-label classification for python.  
- seqlearn - Sequence classification toolkit for Python.  
- pystruct - Simple structured learning framework for Python.  
- sklearn-expertsys - Highly interpretable classifiers for scikit learn.  
- RuleFit - Implementation of the rulefit.  
- metric-learn - Metric learning algorithms in Python.  
- pyGAM - Generalized Additive Models in Python.
- causalml - Uplift modeling and causal inference with machine learning algorithms.  
- TPOT - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.  
- auto-sklearn - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.  
- MLBox - A powerful Automated Machine Learning python library.
- AutoKeras - AutoML library for deep learning.
- AutoGluon - AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
- ML-Ensemble - High performance ensemble learning.  
- Stacking - Simple and useful stacking library written in Python.  
- stacked_generalization - Library for machine learning stacking generalization.  
- vecstack - Python package for stacking (machine learning technique).  
- imbalanced-learn - Module to perform under-sampling and over-sampling with various techniques.  
- imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data.    
- rpforest - A forest of random projection trees.  
- sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016). 
- rgf_python - Python Wrapper of Regularized Greedy Forest.  
- Python Extreme Learning Machine (ELM) - A machine learning technique used for classification/regression tasks.
- hpelm - High-performance implementation of Extreme Learning Machines (fast randomized neural networks).  
- pyFM - Factorization machines in python.  
- fastFM - A library for Factorization Machines.  
- tffm - TensorFlow implementation of an arbitrary order Factorization Machine.    
- liquidSVM - An implementation of SVMs.
- scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API.  
- ThunderSVM - A fast SVM Library on GPUs and CPUs.    
- XGBoost - Scalable, Portable, and Distributed Gradient Boosting.    
- LightGBM - A fast, distributed, high-performance gradient boosting.    
- CatBoost - An open-source gradient boosting on decision trees library.    
- ThunderGBM - Fast GBDTs and Random Forests on GPUs.    
- NGBoost - Natural Gradient Boosting for Probabilistic Prediction.
- TensorFlow Decision Forests - A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.    
- PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration.  
- pytorch-lightning - PyTorch Lightning is just organized PyTorch.  
- ignite - High-level library to help with training neural networks in PyTorch.  
- skorch - A scikit-learn compatible neural network library that wraps PyTorch.    
- Catalyst - High-level utils for PyTorch DL & RL research.  
- ChemicalX - A PyTorch-based deep learning library for drug pair scoring.  
- TensorFlow - Computation using data flow graphs for scalable machine learning by Google.  
- TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer.  
- TFLearn - Deep learning library featuring a higher-level API for TensorFlow.  
- Sonnet - TensorFlow-based neural network library.  
- tensorpack - A Neural Net Training Interface on TensorFlow.  
- Polyaxon - A platform that helps you build, manage and monitor deep learning models.  
- tfdeploy - Deploy TensorFlow graphs for fast evaluation and export to TensorFlow-less environments running numpy.  
- tensorflow-upstream - TensorFlow ROCm port.    
- TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow.  
- TensorLight - A high-level framework for TensorFlow.  
- Mesh TensorFlow - Model Parallelism Made Easier.  
- Ludwig - A toolbox that allows one to train and test deep learning models without the need to write code.  
- Keras - A high-level neural networks API running on top of TensorFlow.  
- keras-contrib - Keras community contributions.  
- Hyperas - Keras + Hyperopt: A straightforward wrapper for a convenient hyperparameter.  
- Elephas - Distributed Deep learning with Keras & Spark.  
- qkeras - A quantization deep learning library.  
- MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler.  
- Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet).  
- Xfer - Transfer Learning library for Deep Neural Networks.  
- MXNet - HIP Port of MXNet.    
- jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.
- transformers - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
- Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
- autograd - Efficiently computes derivatives of numpy code.
- Myia - Deep Learning framework (pre-alpha).
- nnabla - Neural Network Libraries by Sony.
- Caffe - A fast open framework for deep learning.
- Gymnasium - An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym).
- Stable Baselines3 - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
- RLlib - Scalable Reinforcement Learning.
- Acme - A library of reinforcement learning components and agents.
- Catalyst-RL - PyTorch framework for RL research.
- d3rlpy - An offline deep reinforcement learning library.
- Tianshou - An elegant PyTorch deep reinforcement learning library.
- TF-Agents - A library for Reinforcement Learning in TensorFlow.  
- TensorForce - A TensorFlow library for applied reinforcement learning.  
- TRFL - TensorFlow Reinforcement Learning.  
- Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
- keras-rl - Deep Reinforcement Learning for Keras.  
- garage - A toolkit for reproducible reinforcement learning research.
- Horizon - A platform for Applied Reinforcement Learning.
- pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch.  
- pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric.  
- dgl - Python package built to ease deep learning on graph, on top of existing DL frameworks.
- Spektral - Deep learning on graphs.  
- Karate Club - An unsupervised machine learning library for graph-structured data.
- Little Ball of Fur - A library for sampling graph structured data.
- pomegranate - Probabilistic and graphical models for Python.  
- pgmpy - A python library for working with Probabilistic Graphical Models.
- pyAgrum - A GRaphical Universal Modeler.
- pyro - A flexible, scalable deep probabilistic programming library built on PyTorch.  
- PyMC - Bayesian Stochastic Modelling in Python.
- ZhuSuan - Bayesian Deep Learning.  
- GPflow - Gaussian processes in TensorFlow.  
- InferPy - Deep Probabilistic Modelling Made Easy.  
- PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
- sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API.  
- skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute.  
- PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch.  
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
- hsmmlearn - A library for hidden semi-Markov models with explicit durations.
- pyhsmm - Bayesian inference in HSMMs and HMMs.
- GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch.  
- sklearn-crfsuite - A scikit-learn-inspired API for CRFsuite.  
- Featuretools - Automated feature engineering.
- Feature Engine - Feature engineering package with sklearn-like functionality.  
- OpenFE - Automated feature generation with expert-level performance.
- skl-groups - A scikit-learn addon to operate on set/"group"-based features.  
- Feature Forge - A set of tools for creating and testing machine learning features.  
- few - A feature engineering wrapper for sklearn.  
- scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.  
- tsfresh - Automatic extraction of relevant features from time series.  
- dirty_cat - Machine learning on dirty tabular data (especially: string-based variables for classifcation and regression).  
- NitroFE - Moving window features.  
- sk-transformer - A collection of various pandas & scikit-learn compatible transformers for all kinds of preprocessing and feature engineering steps  
- scikit-feature - Feature selection repository in Python.
- boruta_py - Implementations of the Boruta all-relevant feature selection method.  
- BoostARoota - A fast xgboost feature selection algorithm.  
- scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.  
- zoofs - A feature selection library based on evolutionary algorithms.
- Matplotlib - Plotting with Python.
- seaborn - Statistical data visualization using matplotlib.
- prettyplotlib - Painlessly create beautiful matplotlib plots.
- python-ternary - Ternary plotting library for Python with matplotlib.
- missingno - Missing data visualization module for Python.
- chartify - Python library that makes it easy for data scientists to create charts.
- physt - Improved histograms.
- animatplot - A python package for animating plots built on matplotlib.
- plotly - A Python library that makes interactive and publication-quality graphs.
- Bokeh - Interactive Web Plotting for Python.
- Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
- bqplot - Plotting library for IPython/Jupyter notebooks
- pyecharts - Migrated from Echarts, a charting and visualization library, to Python's interactive visual drawing library.   
- folium - Makes it easy to visualize data on an interactive open street map
- geemap - Python package for interactive mapping with Google Earth Engine (GEE)
- HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
- AutoViz: Visualize data automatically with 1 line of code (ideal for machine learning)
- SweetViz: Visualize and compare datasets, target values and associations, with one line of code.
- pyLDAvis: Visualize interactive topic model
- fastapi - Modern, fast (high-performance), a web framework for building APIs with Python
- streamlit - Make it easy to deploy the machine learning model
- gradio - Create UIs for your machine learning model in Python in 3 minutes.
- datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
- binder - Enable sharing and execute Jupyter Notebooks
- dalex - moDel Agnostic Language for Exploration and explanation.   
- Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
- Alibi - Algorithms for monitoring and explaining machine learning models.
- anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
- aequitas - Bias and Fairness Audit Toolkit.
- Contrastive Explanation - Contrastive Explanation (Foil Trees).  
- yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection.  
- scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects.  
- shap - A unified approach to explain the output of any machine learning model.  
- ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
- Lime - Explaining the predictions of any machine learning classifier.  
- FairML - FairML is a python toolbox auditing the machine learning models for bias.  
- L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
- PDPbox - Partial dependence plot toolbox.
- PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
- Skater - Python Library for Model Interpretation.
- model-analysis - Model analysis tools for TensorFlow.  
- themis-ml - A library that implements fairness-aware machine learning algorithms.  
- treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions.  
- AI Explainability 360 - Interpretability and explainability of data and machine learning models.
- Auralisation - Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
- lucid - A collection of infrastructure and tools for research in neural network interpretability.
- Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
- FlashLight - Visualization Tool for your NeuralNetwork.
- tensorboard-pytorch - Tensorboard for PyTorch (and chainer, mxnet, numpy, ...).
- mxboard - Logging MXNet data for visualization in TensorBoard.  
- gplearn - Genetic Programming in Python.  
- DEAP - Distributed Evolutionary Algorithms in Python.
- karoo_gp - A Genetic Programming platform for Python with GPU support.  
- monkeys - A strongly-typed genetic programming framework for Python.
- sklearn-genetic - Genetic feature selection module for scikit-learn.  
- Optuna - A hyperparameter optimization framework.
- Spearmint - Bayesian optimization.
- BoTorch - Bayesian optimization in PyTorch.  
- scikit-opt - Heuristic Algorithms for optimization.
- sklearn-genetic-opt - Hyperparameters tuning and feature selection using evolutionary algorithms.  
- SMAC3 - Sequential Model-based Algorithm Configuration.
- Optunity - Is a library containing various optimizers for hyperparameter tuning.
- hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
- hyperopt-sklearn - Hyper-parameter optimization for sklearn.  
- sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn.  
- sigopt_sklearn - SigOpt wrappers for scikit-learn methods.  
- Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
- SafeOpt - Safe Bayesian Optimization.
- scikit-optimize - Sequential model-based optimization with a scipy.optimizeinterface.
- Solid - A comprehensive gradient-free optimization framework written in Python.
- PySwarms - A research toolkit for particle swarm optimization in Python.
- Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
- GPflowOpt - Bayesian Optimization using GPflow.  
- POT - Python Optimal Transport library.
- Talos - Hyperparameter Optimization for Keras Models.
- nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
- OR-Tools - An open-source software suite for optimization by Google; provides a unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP, CP-SAT, CPLEX, and Gurobi.
- sktime - A unified framework for machine learning with time series.  
- darts - A python library for easy manipulation and forecasting of time series.
- statsforecast - Lightning fast forecasting with statistical and econometric models.
- mlforecast - Scalable machine learning-based time series forecasting.
- neuralforecast - Scalable machine learning-based time series forecasting.
- tslearn - Machine learning toolkit dedicated to time-series data.  
- tick - Module for statistical learning, with a particular emphasis on time-dependent modeling.  
- greykite - A flexible, intuitive, and fast forecasting library next.
- Prophet - Automatic Forecasting Procedure.
- PyFlux - Open source time series library for Python.
- bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
- luminol - Anomaly Detection and Correlation library.
- dateutil - Powerful extensions to the standard datetime module
- maya - makes it very easy to parse a string and for changing timezones
- Chaos Genius - ML powered analytics engine for outlier/anomaly detection and root cause analysis
- torchtext - Data loaders and abstractions for text and NLP.  
- gluon-nlp - NLP made easy.  
- KerasNLP - Modular Natural Language Processing workflows with Keras.  
- spaCy - Industrial-Strength Natural Language Processing.
- NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- CLTK - The Classical Language Toolkik.
- gensim - Topic Modelling for Humans.
- pyMorfologik - Python binding for Morfologik.
- skift - Scikit-learn wrappers for Python fastText.  
- Phonemizer - Simple text-to-phonemes converter for multiple languages.
- flair - Very simple framework for state-of-the-art NLP.
- torchaudio - An audio library for PyTorch.  
- librosa - Python library for audio and music analysis.
- Yaafe - Audio features extraction.
- aubio - A library for audio and music analysis.
- Essentia - Library for audio and music analysis, description, and synthesis.
- LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
- Marsyas - Music Analysis, Retrieval, and Synthesis for Audio Signals.
- muda - A library for augmenting annotated audio data.
- madmom - Python audio and music signal processing library.
- torchvision - Datasets, Transforms, and Models specific to Computer Vision.  
- gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision.  
- KerasCV - Industry-strength Computer Vision workflows with Keras.  
- OpenCV - Open Source Computer Vision Library.
- scikit-image - Image Processing SciKit (Toolbox for SciPy).
- imgaug - Image augmentation for machine learning experiments.
- imgaug_extension - Additional augmentations for imgaug.
- Augmentor - Image augmentation library in Python for machine learning.
- albumentations - Fast image augmentation library and easy-to-use wrapper around other libraries.
- pandas_summary - Extension to pandas dataframes describe function.  
- Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects.  
- statsmodels - Statistical modeling and econometrics in Python.
- stockstats - Supply a wrapper StockDataFramebased on thepandas.DataFramewith inline stock statistics/indicators support.
- weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
- scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
- Alphalens - Performance analysis of predictive (alpha) stock factors.
- pandas - Powerful Python data analysis toolkit.
- polars - A fast multi-threaded, hybrid-out-of-core DataFrame library.
- koalas - pandas API on Apache Spark.  
- Arctic - High-performance datastore for time series and tick data.
- datatable - Data.table for Python.  
- pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
- cuDF - GPU DataFrame Library.    
- blaze - NumPy and pandas interface to Big Data.  
- pandasql - Allows you to query pandas DataFrames using SQL syntax.  
- pandas-gbq - pandas Google Big Query.  
- xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
- pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces.  
- modin - Speed up your pandas workflows by changing a single line of code.  
- swifter - A package that efficiently applies any function to a pandas dataframe or series in the fastest available manner.
- pandas-log - A package that allows providing feedback about basic pandas operations and finds both business logic and performance issues.
- vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.
- xarray - Xarray combines the best features of NumPy and pandas for multidimensional data selection by supplementing numerical axis labels with named dimensions for more intuitive, concise, and less error-prone indexing routines.
- pdpipe - Sasy pipelines for pandas DataFrames.
- SSPipe - Python pipe (|) operator with support for DataFrames and Numpy, and Pytorch.
- pandas-ply - Functional data manipulation for pandas.  
- Dplython - Dplyr for Python.  
- sklearn-pandas - pandas integration with sklearn.    
- Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
- pyjanitor - Clean APIs for data cleaning.  
- meza - A Python toolkit for processing tabular data.
- Prodmodel - Build system for data science pipelines.
- dopanda - Hints and tips for using pandas in an analysis environment.  
- Hamilton - A microframework for dataframe generation that applies Directed Acyclic Graphs specified by a flow of lazily evaluated Python functions.
- cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
- snorkel - A system for quickly generating training data with weak supervision.
- dataprep - Collect, clean, and visualize your data in Python with a few lines of code.
- ydata-synthetic - A package to generate synthetic tabular and time-series data leveraging the state-of-the-art generative models.  
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.  
- PySpark - Exposes the Spark programming model to Python.  
- Veles - Distributed machine learning platform.
- Jubatus - Framework and Library for Distributed Online Machine Learning.
- DMTK - Microsoft Distributed Machine Learning Toolkit.
- PaddlePaddle - PArallel Distributed Deep LEarning.
- dask-ml - Distributed and parallel machine learning.  
- Distributed - Distributed computation in Python.
- mlflow - Open source platform for the machine learning lifecycle.
- Neptune - A lightweight ML experiment tracking, results visualization, and management tool.
- dvc - Data Version Control | Git for Data & Models | ML Experiments Management.
- envd - 🏕️ machine learning development environment for data science and AI/ML engineering teams.
- Sacred - A tool to help you configure, organize, log, and reproduce experiments.
- Ax - Adaptive Experimentation Platform.  
- great_expectations - Always know what to expect from your data.
- pandera - A lightweight, flexible, and expressive statistical data testing library.
- deepchecks - Validation & testing of ML models and data during model development, deployment, and production.  
- evidently - Evaluate and monitor ML models from validation to production.
- TensorFlow Data Validation - Library for exploring and validating machine learning data.
- recmetrics - Library of useful metrics and plots for evaluating recommender systems.
- Metrics - Machine learning evaluation metric.
- sklearn-evaluation - Model evaluation made easy: plots, tables, and markdown reports.  
- AI Fairness 360 - Fairness metrics for datasets and ML models, explanations, and algorithms to mitigate bias in datasets and models.
- numpy - The fundamental package needed for scientific computing with Python.
- Dask - Parallel computing with task scheduling.  
- bottleneck - Fast NumPy array functions written in C.
- CuPy - NumPy-like API accelerated with CUDA.
- scikit-tensor - Python library for multilinear algebra and tensor factorizations.
- numdifftools - Solve automatic numerical differentiation problems in one or more variables.
- quaternion - Add built-in support for quaternions to numpy.
- adaptive - Tools for adaptive and parallel samping of mathematical functions.
- NumExpr - A fast numerical expression evaluator for NumPy that comes with an integrated computing virtual machine to speed calculations up by avoiding memory allocation for intermediate results.
- BeautifulSoup: The easiest library to scrape static websites for beginners
- Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the core
- Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
- Pattern: High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
- twitterscraper: Efficient library to scrape Twitter
- qiskit - Qiskit is an open-source SDK for working with quantum computers at the level of circuits, algorithms, and application modules.
- cirq - A python framework for creating, editing, and invoking Noisy Intermediate Scale Quantum (NISQ) circuits.
- PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
- QML - A Python Toolkit for Quantum Machine Learning.
- sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript, and others.
- ONNX - Open Neural Network Exchange.
- MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
Contributions are welcome! 😎 
 Read the contribution guideline.
This work is licensed under the Creative Commons Attribution 4.0 International License - CC BY 4.0
