A curated list of recommended Python frameworks, libraries, software and resources, all particularly useful for scientific Python users.
Intended for students and researchers in the sciences who want to get the most out of the open-source Python ecosystem. Aims to provide a list of tools useful for common tasks for scientists, without mentioning things which they are unlikely ever to need (e.g. authentication, databases, networking, NLP).
There is a section of must-haves for beginners.
List inspired by awesome-python, which is a great similar resource for anything else you might want to do with Python!
Some libraries appear multiple times where they are useful in multiple ways.
- Python-for-Science
- Algebra
- Animations
- Bayesian Analysis
- Better Scientific Software
- Code Quality
- Data Storage
- Debugging
- Development Environments
- Documentation
- Domain-specific
- Error Handling
- Forecasting
- Gotchas
- GPU Acceleration
- Graphical Interfaces
- Job Scheduling
- Labelled data
- Mathematical Library Functions
- Numerical Data
- Optimisation
- Package Management
- Parallelization
- Physical Units
- Plotting
- Presentations
- Profiling and Benchmarking
- Scripting
- Speed
- Statistics
- Testing
- Visualisation
- Workflow
- Beginners Recommendations
Libraries for manipulation of symbolic algebra, analytic integration etc.
- SymPy -
- sagemath - Mathematical software system with features covering multiple aspects of mathematics, including algebra, combinatorics, numerical mathematics, number theory, and calculus.
- animatplot - A wrapper around
matplotlib'sfuncanimationlibrary - makes it very easy to animate matplotlib plots.
- Better Scientific Software - Articles and resources on how to write better scientific software.
- PEP8 -
- flake8 -
- pycodestyle -
- structure - The officially recommended way to structure any python project.
- pdb - Python debugger
Programs to write code into. The main choice is between a software-engineering style IDE, and one intended specifically for scientific users.
- JupyterLab - An IDE which incorporates Jupyter notebooks.
- PyCharm - Very powerful IDE for python. Use if you want the full powers a software engineer would expect. Has a professional version, which is free for students.
- spyder - MatLab-like development environment for scientific python users.
- sphinx - Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, from the docstrings in your code. Originally created for documenting the python language itself.
- nbconvert - Convert jupyter notebooks to other formats such as PDF, LaTeX, HTML.
Libraries of tools developed for python users in various fields of science.
- astropy - Various tools and functionality for astronomy and astrophysics.
- Biopython - Tools for biological computation.
- geoviews - Makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research.
- MetPy - MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
- NetworkX - A package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
- nilearn - Machine learning for Neuro-Imaging in python.
- PlasmaPy - Various tools for plasma physics.
- psychopy - An open-source application allowing you run a wide range of neuroscience, psychology and psychophysics experiments.
- pyrocko - A seismology toolkit for python.
- scikit-beam - Data analysis tools for X-Ray, Neutron and Electron sciences
- scikit-spectra - A community developed python package for spectroscopy.
- SunPy - SunPy is a data-analysis environment specializing in providing the software necessary to analyze solar and heliospheric data in Python.
- TomoPy - Package for tomographic data processing and image reconstruction.
- errors -
- warnings - Throw proper warnings instead of using print statements. Python standard library module.
- logging
- prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. Developed by Facebook.
- pyqt5 -
- experi -
- papermill - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.
- scipy - The standard resource for all kinds of mathematical functions.
- xrft - Discrete Fourier transform operations for xarray data structures.
- numpy - The fundamental package for numerhon. So ubiquitous that it might as well be part of python's standard library at this point. Ultimately just a contiguous-in-memory C array, wrapped very nicely with python.
- nlopt - Library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization.
Keep track of module dependencies, python versions, and virtual environments.
- conda - A package manager specifically intended for use by the scientific python community. Developed by the authors of numpy to manage both python packages and the underlying C/Fortran libraries which make them fast. Also obviates the need for system virtual environments.
- anaconda - Conda, but packaged with a wide range of useful scientific python libraries, including many from this list.
- pip - The standard way to install python packages. Use when you can't use conda, but will play nicely together.
- setuptools - For when you make your own module, and want to install it properly into your conda environment (so you never need to touch your
$PYTHONPATH!)
Use all the cores of your machine, and scale up to clusters!
- dask - Tools for splitting up computations and executing them across many processors in parallel. dask.array in particular provides a numpy-like interface to a chunked-in-memory array. Dask is especially useful for analysing datasets which are larger than your RAM.
- xarray - Employs dask behind the scenes to parallelize most operations. Simply load your dataset in "chunks" and xarray will operate on each chunk in parallel:
# Load data in chunks ds = open_dataset('data.nc', chunks={'space': 100} # Will operate on each spatial chunk in parallel using dask ds['density'].mean(dim='time')Producing static plots of publication quality.
- matplotlib -
- anatomy of matplotlib - Tutorial on how matplotlib is structured.
- scientific-matplotlib -
- seaborn -
- xarray.plot - Submodule of xarray which makes plotting into a one-line job:
data['density'].plot(). - colorcet - A set of useful perceptually uniform colormaps for plotting scientific data
- Binder - Online Jupyter Notebook hosting for GitHub repositories. Allows users to run Jupyter notebooks from GitHub repositories in the cloud, without Python installed locally.
- nb_pdf_template - A more accurate representation of jupyter notebooks when converting to pdfs.
- RISE - A plugin for Jupyter which turns notebooks into slick presentations.
- jupyter-rise -
- py-spy - A profiler for python code which doesn't interfere with the running prohttps://palletsprojects.com/p/click/cess.
Tools which are likely to be useful when writing python scripts to automate common tasks.
- click - Run your scripts from the command line, with as little extra code as possible.
- dateutil - Provides powerful extensions to the standard datetime module available in Python.
- gitpython - Interact with git from python. Useful for tasks like checking if your simulation code has uncommitted changes before executing it.
- pathlib - Use this anytime you want to do anything with a file path. Obviates the need for
osandsysmost of the time. A module in the python standard library.
Python inevitably sacrifices some speed to gain increased clarity. Scientific programs usually have one or two functions which do 90% of the work, and there are various ways to dramatically speed these up. Use in conjunction with parallelization through dask if you want as much speed as possible.
- cython - A compiler which allows you to write snippets of C code into your python for massive speed increases.
- F2PY - For calling fast, compiled Fortran subroutines from Python (part of SciPy)
- numba - Automatic generation of fast compiled C code from Python functions.
- bottleneck - A collection of fast numpy array functions written in C.
- Theano - Allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
- statsmodels - Provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
Check that your code actually does what you think it will do!
- pytest - The standard unit testing framework for python. Essential - if you're not unit-testing your calculations then you are merely hoping that they are actually doing what you think they are.
pytestdoes a lot of magic behind the scenes to make it as simple as possible to use, with no boilerplate. - pytest-clarity - A plugin which improves the readability of pytest output.
- hypothesis - Hypothesis testing for python. Normal tests check that your function behaves as expected for some specific input. Hypothesis tests check that your function behaves as expected for any input of some type, e.g. any string, or any numpy array. Basically magic, compatible with pytest, and the algorithms used in the implementation are very interesting.
- cosmic-ray - Mutation testing in python. Checks that your test coverage is robust by randomly changing pieces of your code and checking that this change is actually caught by a test failing.
- flaky - pytest plugin for automatically re-running inconsistent ("flaky") tests.
- animatplot - A wrapper around
matplotlib'sfuncanimationlibrary - makes it very easy to animate matplotlib plots. - mayavi - 3D scientific data visualization and plotting in Python.
- cartopy - A library for cartographic projections and plots, with matplotlib support.
- bokeh -
- plotly -
- holoviews -
- ipyvolume - 3d plotting for Python in the Jupyter notebook.
- vispy - Interactive scientific visualisation in python.
- yt - Very powerful software suite for analysing and visualising volumetric data. Written by astrophysicists, but since applied to many other domains.
Don't just write and run python scripts. Tools to make your workflow faster, clearer, and easier to come back to later.
- ipython - Run python interactively, like MatLab! Forms the backend of Jupyter notebooks.
- jupyter notebooks -
- jupyterlab - A development environment in which you can write Jupyter notebooks. The spiritual successor to spyder, in that it is designed specifically for scientists.
- papermill - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.
- First, install python through anaconda, which will also give you the packages you're about to use.
- Write your code in either
pycharm(if you want a professional IDE),spyderorjupyterlab(if you're used to MatLabs' environment). - Become familiar with
numpy, the fundamental numeric object in python, andmatplotlib, the standard way to plot. - Next, wrap your data into clearer, higher-level objects with either
Pandasorxarray(usexarrayif your data has more than one dimension). - Before writing new analysis functions, check if someone has already solved your problem for you in
scipy, or in one of python's domain-specific scientific software packages. - As soon as you start writing your own analysis functions, test they're correct with unit tests written with
pytest. - Analyse your data interactively with
ipython, and record your work in aJupyter notebook.