Probably the best curated list of data science software in Python
- Machine Learning
 - Deep Learning
 - Data Manipulation
 - Feature Engineering
 - Visualization
 - Model Explanation
 - Reinforcement Learning
 - Probabilistic Methods
 - Genetic Programming
 - Optimization
 - Natural Language Processing
 - Computer Audition
 - Computer Vision
 - Statistics
 - Distributed Computing
 - Experimentation
 - Evaluation
 - Computations
 - Spatial Analysis
 - Quantum Computing
 - Conversion
 
- scikit-learn - Machine learning in Python. 

 - Shogun - Machine learning toolbox.
 - xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
 - cuML - RAPIDS Machine Learning Library. 
 
 - modAL - Modular active learning framework for Python3. 

 - Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. 
 
 - mlpack - A scalable C++ machine learning library (Python bindings).
 - dlib - Toolkit for making real world machine learning and data analysis applications in C++ (Python bindings).
 - MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. 

 - hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. 
 
 - Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. 

 - scikit-multilearn - Multi-label classification for python. 

 - seqlearn - Sequence classification toolkit for Python. 

 - pystruct - Simple structured learning framework for Python. 

 - sklearn-expertsys - Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models. 

 - RuleFit - Implementation of the rulefit. 

 - metric-learn - Metric learning algorithms in Python. 

 - pyGAM - Generalized Additive Models in Python.
 - Karate Club - An unsupervised machine learning library for graph structured data.
 - Little Ball of Fur - A library for sampling graph structured data.
 - causalml - Uplift modeling and causal inference with machine learning algorithms. 

 
- tslearn - Machine learning toolkit dedicated to time-series data. 

 - tick - Module for statistical learning, with a particular emphasis on time-dependent modelling. 

 - Prophet - Automatic Forecasting Procedure.
 - PyFlux - Open source time series library for Python.
 - bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
 - luminol - Anomaly Detection and Correlation library.
 
- TPOT - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. 

 - auto-sklearn - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. 

 - MLBox - A powerful Automated Machine Learning python library.
 
- ML-Ensemble - High performance ensemble learning. 

 - Stacking - Simple and useful stacking library, written in Python. 

 - stacked_generalization - Library for machine learning stacking generalization. 

 - vecstack - Python package for stacking (machine learning technique). 

 
- imbalanced-learn - Module to perform under sampling and over sampling with various techniques. 

 - imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. 
 
 
- rpforest - A forest of random projection trees. 

 - sklearn-random-bits-forest - Wrapper of the Random Bits Forest program written by (Wang et al., 2016).

 - rgf_python - Python Wrapper of Regularized Greedy Forest. 

 
- Python-ELM - Extreme Learning Machine implementation in Python. 

 - Python Extreme Learning Machine (ELM) - A machine learning technique used for classification/regression tasks.
 - hpelm - High performance implementation of Extreme Learning Machines (fast randomized neural networks). 

 
- pyFM - Factorization machines in python. 

 - fastFM - A library for Factorization Machines. 

 - tffm - TensorFlow implementation of an arbitrary order Factorization Machine. 
 
 - liquidSVM - An implementation of SVMs.
 - scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. 

 - ThunderSVM - A fast SVM Library on GPUs and CPUs. 
 
 
- XGBoost - Scalable, Portable and Distributed Gradient Boosting. 
 
 - LightGBM - A fast, distributed, high performance gradient boosting. 
 
 - CatBoost - An open-source gradient boosting on decision trees library. 
 
 - ThunderGBM - Fast GBDTs and Random Forests on GPUs. 
 
 
- PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. 

 - torchvision - Datasets, Transforms and Models specific to Computer Vision. 

 - torchtext - Data loaders and abstractions for text and NLP. 

 - torchaudio - An audio library for PyTorch. 

 - ignite - High-level library to help with training neural networks in PyTorch. 

 - PyToune - A Keras-like framework and utilities for PyTorch.
 - skorch - A scikit-learn compatible neural network library that wraps pytorch. 
 
 - PyTorchNet - An abstraction to train neural networks. 

 - pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. 

 - Catalyst - High-level utils for PyTorch DL & RL research. 

 
- TensorFlow - Computation using data flow graphs for scalable machine learning by Google. 

 - TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. 

 - TFLearn - Deep learning library featuring a higher-level API for TensorFlow. 

 - Sonnet - TensorFlow-based neural network library. 

 - tensorpack - A Neural Net Training Interface on TensorFlow. 

 - Polyaxon - A platform that helps you build, manage and monitor deep learning models. 

 - NeuPy - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: 
). 
 - tfdeploy - Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. 

 - tensorflow-upstream - TensorFlow ROCm port. 
 
 - TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. 

 - tensorlm - Wrapper library for text generation / language models at char and word level with RNN. 

 - TensorLight - A high-level framework for TensorFlow. 

 - Mesh TensorFlow - Model Parallelism Made Easier. 

 - Ludwig - A toolbox, that allows to train and test deep learning models without the need to write code. 

 
- Keras - A high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. 

 - keras-contrib - Keras community contributions. 

 - Hyperas - Keras + Hyperopt: A very simple wrapper for convenient hyperparameter. 

 - Elephas - Distributed Deep learning with Keras & Spark. 

 - Hera - Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. 

 - Spektral - Deep learning on graphs. 

 - qkeras - A quantization deep learning library. 

 
- MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. 

 - Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). 

 - MXbox - Simple, efficient and flexible vision toolbox for mxnet framework. 

 - gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision. 

 - gluon-nlp - NLP made easy. 

 - Xfer - Transfer Learning library for Deep Neural Networks. 

 - MXNet - HIP Port of MXNet. 
 
 
- Chainer - A flexible framework for neural networks.
 - ChainerCV - A Library for Deep Learning in Computer Vision.
 - ChainerMN - Scalable distributed deep learning with Chainer.
 
WARNING: Theano development has been stopped
- Theano - A Python library that allows you to define, optimize, and evaluate mathematical expressions.

 - Lasagne - Lightweight library to build and train neural networks in Theano. 

 - nolearn - A scikit-learn compatible neural network library (mainly for Lasagne). 
 
 - Blocks - A Theano framework for building and training neural networks. 

 - scikit-neuralnetwork - Deep neural networks without the learning cliff. 
 
 - platoon - Multi-GPU mini-framework for Theano. 

 - Theano-MPI - MPI Parallel framework for training deep learning models built in Theano. 

 
- CNTK - Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit.
 - Neon - Intel® Nervana™ reference deep learning framework committed to best performance on all hardware.
 - Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
 - autograd - Efficiently computes derivatives of numpy code.
 - Myia - Deep Learning framework (pre-alpha).
 - nnabla - Neural Network Libraries by Sony.
 - Caffe - A fast open framework for deep learning.
 - Caffe2 - A lightweight, modular, and scalable deep learning framework (now a part of PyTorch).
 - hipCaffe - The HIP port of Caffe. 

 
- pandas - Powerful Python data analysis toolkit.
 - cuDF - GPU DataFrame Library. 
 
 - blaze - NumPy and pandas interface to Big Data. 

 - pandasql - Allows you to query pandas DataFrames using SQL syntax. 

 - pandas-gbq - pandas Google Big Query. 

 - xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
 - pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. 

 - Arctic - High performance datastore for time series and tick data.
 - datatable - Data.table for Python. 

 - koalas - pandas API on Apache Spark. 

 - modin - Speed up your pandas workflows by changing a single line of code. 

 - swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
 - pandas_flavor - A package which allow to write your own flavor of Pandas easily.
 - pandas-log - A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues.
 
- pdpipe - Sasy pipelines for pandas DataFrames.
 - SSPipe - Python pipe (|) operator with support for DataFrames and Numpy and Pytorch.
 - pandas-ply - Functional data manipulation for pandas. 

 - Dplython - Dplyr for Python. 

 - sklearn-pandas - pandas integration with sklearn. 
 
 - Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
 - pyjanitor - Clean APIs for data cleaning. 

 - meza - A Python toolkit for processing tabular data.
 - Prodmodel - Build system for data science pipelines.
 - dopanda - Hints and tips for using pandas in an analysis environment. 

 
- Featuretools - Automated feature engineering.
 - skl-groups - A scikit-learn addon to operate on set/"group"-based features. 

 - Feature Forge - A set of tools for creating and testing machine learning feature. 

 - few - A feature engineering wrapper for sklearn. 

 - scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. 

 - tsfresh - Automatic extraction of relevant features from time series. 

 
- scikit-feature - Feature selection repository in python.
 - boruta_py - Implementations of the Boruta all-relevant feature selection method. 

 - BoostARoota - A fast xgboost feature selection algorithm. 

 - scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. 

 
- Matplotlib - Plotting with Python.
 - seaborn - Statistical data visualization using matplotlib.
 - Bokeh - Interactive Web Plotting for Python.
 - HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
 - prettyplotlib - Painlessly create beautiful matplotlib plots.
 - python-ternary - Ternary plotting library for python with matplotlib.
 - missingno - Missing data visualization module for Python.
 - chartify - Python library that makes it easy for data scientists to create charts.
 - physt - Improved histograms.
 - animatplot - A python package for animating plots build on matplotlib.
 - plotly - A Python library that makes interactive and publication-quality graphs.
 
- Alibi - Algorithms for monitoring and explaining machine learning models.
 - anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
 - aequitas - Bias and Fairness Audit Toolkit.
 - Contrastive Explanation - Contrastive Explanation (Foil Trees). 

 - yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. 

 - scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. 

 - shap - A unified approach to explain the output of any machine learning model. 

 - ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
 - Lime - Explaining the predictions of any machine learning classifier. 

 - FairML - FairML is a python toolbox auditing the machine learning models for bias. 

 - L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
 - PDPbox - Partial dependence plot toolbox.
 - pyBreakDown - Python implementation of R package breakDown. 


 - PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
 - Skater - Python Library for Model Interpretation.
 - model-analysis - Model analysis tools for TensorFlow. 

 - themis-ml - A library that implements fairness-aware machine learning algorithms. 

 - treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. 

 - AI Explainability 360 - Interpretability and explainability of data and machine learning models.
 - Auralisation - Auralisation of learned features in CNN (for audio).
 - CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
 - lucid - A collection of infrastructure and tools for research in neural network interpretability.
 - Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
 - FlashLight - Visualization Tool for your NeuralNetwork.
 - tensorboard-pytorch - Tensorboard for pytorch (and chainer, mxnet, numpy, ...).
 - mxboard - Logging MXNet data for visualization in TensorBoard. 

 
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
 - Coach - Easy experimentation with state of the art Reinforcement Learning algorithms.
 - garage - A toolkit for reproducible reinforcement learning research.
 - OpenAI Baselines - High-quality implementations of reinforcement learning algorithms.
 - Stable Baselines - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
 - RLlib - Scalable Reinforcement Learning.
 - Horizon - A platform for Applied Reinforcement Learning.
 - TF-Agents - A library for Reinforcement Learning in TensorFlow. 

 - TensorForce - A TensorFlow library for applied reinforcement learning. 

 - TRFL - TensorFlow Reinforcement Learning. 

 - Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
 - keras-rl - Deep Reinforcement Learning for Keras. 

 - ChainerRL - A deep reinforcement learning library built on top of Chainer.
 
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. 

 - PySpark - Exposes the Spark programming model to Python. 

 - Veles - Distributed machine learning platform.
 - Jubatus - Framework and Library for Distributed Online Machine Learning.
 - DMTK - Microsoft Distributed Machine Learning Toolkit.
 - PaddlePaddle - PArallel Distributed Deep LEarning.
 - dask-ml - Distributed and parallel machine learning. 

 - Distributed - Distributed computation in Python.
 
- pomegranate - Probabilistic and graphical models for Python. 

 - pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. 

 - ZhuSuan - Bayesian Deep Learning. 

 - PyMC - Bayesian Stochastic Modelling in Python.
 - PyMC3 - Python package for Bayesian statistical modeling and Probabilistic Machine Learning. 

 - sampled - Decorator for reusable models in PyMC3.
 - Edward - A library for probabilistic modeling, inference, and criticism. 

 - InferPy - Deep Probabilistic Modelling Made Easy. 

 - GPflow - Gaussian processes in TensorFlow. 

 - PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
 - gelato - Bayesian dessert for Lasagne. 

 - sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. 

 - skggm - Estimation of general graphical models. 

 - pgmpy - A python library for working with Probabilistic Graphical Models.
 - skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute. 

 - Aboleth - A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation. 

 - PtStat - Probabilistic Programming and Statistical Inference in PyTorch. 

 - PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. 

 - emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
 - hsmmlearn - A library for hidden semi-Markov models with explicit durations.
 - pyhsmm - Bayesian inference in HSMMs and HMMs.
 - GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. 

 - MXFusion - Modular Probabilistic Programming on MXNet. 

 - sklearn-crfsuite - A scikit-learn inspired API for CRFsuite. 

 
- gplearn - Genetic Programming in Python. 

 - DEAP - Distributed Evolutionary Algorithms in Python.
 - karoo_gp - A Genetic Programming platform for Python with GPU support. 

 - monkeys - A strongly-typed genetic programming framework for Python.
 - sklearn-genetic - Genetic feature selection module for scikit-learn. 

 
- Spearmint - Bayesian optimization.
 - BoTorch - Bayesian optimization in PyTorch. 

 - scikit-opt - Heuristic Algorithms for optimization.
 - SMAC3 - Sequential Model-based Algorithm Configuration.
 - Optunity - Is a library containing various optimizers for hyperparameter tuning.
 - hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
 - hyperopt-sklearn - Hyper-parameter optimization for sklearn. 

 - sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. 

 - sigopt_sklearn - SigOpt wrappers for scikit-learn methods. 

 - Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
 - SafeOpt - Safe Bayesian Optimization.
 - scikit-optimize - Sequential model-based optimization with a 
scipy.optimizeinterface. - Solid - A comprehensive gradient-free optimization framework written in Python.
 - PySwarms - A research toolkit for particle swarm optimization in Python.
 - Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
 - GPflowOpt - Bayesian Optimization using GPflow. 

 - POT - Python Optimal Transport library.
 - Talos - Hyperparameter Optimization for Keras Models.
 - nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).
 
- NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
 - CLTK - The Classical Language Toolkik.
 - gensim - Topic Modelling for Humans.
 - PSI-Toolkit - A natural language processing toolkit.
 - pyMorfologik - Python binding for Morfologik.
 - skift - Scikit-learn wrappers for Python fastText. 

 - Phonemizer - Simple text to phonemes converter for multiple languages.
 - flair - Very simple framework for state-of-the-art NLP.
 - spaCy - Industrial-Strength Natural Language Processing.
 
- librosa - Python library for audio and music analysis.
 - Yaafe - Audio features extraction.
 - aubio - A library for audio and music analysis.
 - Essentia - Library for audio and music analysis, description and synthesis.
 - LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
 - Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals.
 - muda - A library for augmenting annotated audio data.
 - madmom - Python audio and music signal processing library.
 
- OpenCV - Open Source Computer Vision Library.
 - scikit-image - Image Processing SciKit (Toolbox for SciPy).
 - imgaug - Image augmentation for machine learning experiments.
 - imgaug_extension - Additional augmentations for imgaug.
 - Augmentor - Image augmentation library in Python for machine learning.
 - albumentations - Fast image augmentation library and easy to use wrapper around other libraries.
 
- pandas_summary - Extension to pandas dataframes describe function. 

 - Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. 

 - statsmodels - Statistical modeling and econometrics in Python.
 - stockstats - Supply a wrapper 
StockDataFramebased on thepandas.DataFramewith inline stock statistics/indicators support. - weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
 - scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
 - Alphalens - Performance analysis of predictive (alpha) stock factors.
 
- Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. 

 - PySpark - Exposes the Spark programming model to Python. 

 - Veles - Distributed machine learning platform.
 - Jubatus - Framework and Library for Distributed Online Machine Learning.
 - DMTK - Microsoft Distributed Machine Learning Toolkit.
 - PaddlePaddle - PArallel Distributed Deep LEarning.
 - dask-ml - Distributed and parallel machine learning. 

 - Distributed - Distributed computation in Python.
 
- Sacred - A tool to help you configure, organize, log and reproduce experiments.
 - Xcessiv - A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
 - Persimmon - A visual dataflow programming language for sklearn.
 - Ax - Adaptive Experimentation Platform. 

 - Neptune - A lightweight ML experiment tracking, results visualization and management tool.
 
- recmetrics - Library of useful metrics and plots for evaluating recommender systems.
 - Metrics - Machine learning evaluation metric.
 - sklearn-evaluation - Model evaluation made easy: plots, tables and markdown reports. 

 - AI Fairness 360 - Fairness metrics for datasets and ML models, explanations and algorithms to mitigate bias in datasets and models.
 
- numpy - The fundamental package needed for scientific computing with Python.
 - Dask - Parallel computing with task scheduling. 

 - bottleneck - Fast NumPy array functions written in C.
 - CuPy - NumPy-like API accelerated with CUDA.
 - scikit-tensor - Python library for multilinear algebra and tensor factorizations.
 - numdifftools - Solve automatic numerical differentiation problems in one or more variables.
 - quaternion - Add built-in support for quaternions to numpy.
 - adaptive - Tools for adaptive and parallel samping of mathematical functions.
 
- PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
 - QML - A Python Toolkit for Quantum Machine Learning.
 
- sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
 - ONNX - Open Neural Network Exchange.
 - MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.
 
Contributions are welcome! 😎 
 Read the contribution guideline.
This work is licensed under the Creative Commons Attribution 4.0 International License - CC BY 4.0
